Hi Geant4 community member,
I am pretty new to Geant4. I am testing the multi-thread computing performance of B1 example, but got some bizarre results. I found that the best performance can be achieved by running in serial, however, if I use multi-thread run manager, the running time is much larger than running in serial. If I set number of threads to be 1, (in theory, this should be equivalent to running in serial), the performance is still much worse than running in serial. I noticed the CPU usage is much higher in the case of serial computing (63.1%), however, in the case of multi-thread computing, the CPU is consistent at about 16-18% no matter how many threads I use. I don’t know if this is related to the issue.
I did not change the code of B1 example, except in the main function, I directly set the number of threads by adding a line “runManager->SetNumberOfThreads(2);” after initializing the run manager.
I used the Qt GUI to run the simulation with command “/run/beamOn 10000” for each case. It looks like to me that the setting of my multi-thread simulation was correct. When number of thread equals 4, each thread processes 2500 event. When number of thread equals 2, each thread processes 5000 events. But the performance is really puzzling for me to understand. Please let me know if you need more information to diagnose my issue.
My laptop is MacBook Pro 2018, below is the configuration information
Operating system: OSX 10.15.7.
Processor: 2.6 GHz 6-Core Intel Core i7.
Memory: 16 GB 2400 MHz DDR4.
Graphics: Radeon Pro 560X 4 GB Intel UHD Graphics 630 1536 MB
Here are the console output of the multiple runs:
When running in multi-thread, with number of threads = 4:
##############################################################################
Available UI session types: [ Qt, GAG, tcsh, csh ]
G4WT0 > /control/saveHistory
G4WT3 > /control/saveHistory
G4WT1 > /control/saveHistory
G4WT2 > /control/saveHistory
G4WT0 > /run/verbose 2
G4WT3 > /run/verbose 2
G4WT2 > /run/verbose 2
G4WT1 > /run/verbose 2
G4WT0 > /run/initialize
G4WT2 > /run/initialize
G4WT1 > /run/initialize
G4WT0 > /run/physicsModified
G4WT3 > /run/initialize
G4WT2 > /run/physicsModified
G4WT1 > /run/physicsModified
G4WT3 > /run/physicsModified
G4WT1 > /tracking/storeTrajectory 2
G4WT0 > /tracking/storeTrajectory 2
G4WT3 > /tracking/storeTrajectory 2
G4WT2 > /tracking/storeTrajectory 2
G4WT1 > ### Run 0 starts on worker thread 1.
G4WT0 > ### Run 0 starts on worker thread 0.
G4WT3 > ### Run 0 starts on worker thread 3.
G4WT2 > ### Run 0 starts on worker thread 2.
G4WT1 > Thread-local run terminated.
G4WT1 > Run Summary
G4WT1 > Number of events processed : 2500
G4WT1 > User=1.840000s Real=10.763816s Sys=0.100000s [Cpu=18.0%]
G4WT1 >
G4WT1 > --------------------End of Local Run------------------------
G4WT1 > The run consists of 2500 gamma of 6 MeV
G4WT1 > Cumulated dose per run, in scoring volume : 112.939 picoGy rms = 6.26384 picoGy
G4WT1 > ------------------------------------------------------------
G4WT1 >
G4WT2 > Thread-local run terminated.
G4WT2 > Run Summary
G4WT2 > Number of events processed : 2500
G4WT2 > User=1.840000s Real=10.764333s Sys=0.100000s [Cpu=18.0%]
G4WT2 >
G4WT2 > --------------------End of Local Run------------------------
G4WT2 > The run consists of 2500 gamma of 6 MeV
G4WT2 > Cumulated dose per run, in scoring volume : 106.1 picoGy rms = 6.11031 picoGy
G4WT2 > ------------------------------------------------------------
G4WT2 >
G4WT0 > Thread-local run terminated.
G4WT0 > Run Summary
G4WT0 > Number of events processed : 2500
G4WT0 > User=1.840000s Real=10.765963s Sys=0.100000s [Cpu=18.0%]
G4WT0 >
G4WT0 > --------------------End of Local Run------------------------
G4WT0 > The run consists of 2500 gamma of 6 MeV
G4WT0 > Cumulated dose per run, in scoring volume : 105.887 picoGy rms = 6.19659 picoGy
G4WT0 > ------------------------------------------------------------
G4WT0 >
G4WT3 > Thread-local run terminated.
G4WT3 > Run Summary
G4WT3 > Number of events processed : 2500
G4WT3 > User=1.840000s Real=10.766906s Sys=0.100000s [Cpu=18.0%]
G4WT3 >
G4WT3 > --------------------End of Local Run------------------------
G4WT3 > The run consists of 2500 gamma of 6 MeV
G4WT3 > Cumulated dose per run, in scoring volume : 106.636 picoGy rms = 6.19897 picoGy
G4WT3 > ------------------------------------------------------------
G4WT3 >
##############################################################################
When running in multi-thread, with number of threads = 2:
##############################################################################
G4WT0 > /control/saveHistory
G4WT1 > /control/saveHistory
G4WT0 > /run/verbose 2
G4WT0 > /run/initialize
G4WT0 > /run/physicsModified
G4WT1 > /run/verbose 2
G4WT1 > /run/initialize
G4WT1 > /run/physicsModified
G4WT1 > /tracking/storeTrajectory 2
G4WT0 > /tracking/storeTrajectory 2
G4WT1 > ### Run 0 starts on worker thread 1.
G4WT0 > ### Run 0 starts on worker thread 0.
G4WT1 > Thread-local run terminated.
G4WT1 > Run Summary
G4WT1 > Number of events processed : 4970
G4WT1 > User=1.690000s Real=10.759053s Sys=0.110000s [Cpu=16.7%]
G4WT1 >
G4WT1 > --------------------End of Local Run------------------------
G4WT1 > The run consists of 4970 gamma of 6 MeV
G4WT1 > Cumulated dose per run, in scoring volume : 231.798 picoGy rms = 9.16681 picoGy
G4WT1 > ------------------------------------------------------------
G4WT1 >
G4WT0 > Thread-local run terminated.
G4WT0 > Run Summary
G4WT0 > Number of events processed : 5030
G4WT0 > User=1.690000s Real=10.761573s Sys=0.110000s [Cpu=16.7%]
G4WT0 >
G4WT0 > --------------------End of Local Run------------------------
G4WT0 > The run consists of 5030 gamma of 6 MeV
G4WT0 > Cumulated dose per run, in scoring volume : 199.764 picoGy rms = 8.32225 picoGy
G4WT0 > ------------------------------------------------------------
G4WT0 >
##############################################################################
When running in multi-thread, with number of threads = 1:
##############################################################################
Available UI session types: [ Qt, GAG, tcsh, csh ]
G4WT0 > /control/saveHistory
G4WT0 > /run/verbose 2
G4WT0 > /run/initialize
G4WT0 > /run/physicsModified
G4WT0 > /tracking/storeTrajectory 2
G4WT0 > ### Run 0 starts on worker thread 0.
G4WT0 > Thread-local run terminated.
G4WT0 > Run Summary
G4WT0 > Number of events processed : 10000
G4WT0 > User=1.380000s Real=7.935211s Sys=0.060000s [Cpu=18.1%]
G4WT0 >
G4WT0 > --------------------End of Local Run------------------------
G4WT0 > The run consists of 10000 gamma of 6 MeV
G4WT0 > Cumulated dose per run, in scoring volume : 431.562 picoGy rms = 12.3859 picoGy
G4WT0 > ------------------------------------------------------------
G4WT0 >
##############################################################################
When running in serial:
##############################################################################
Run terminated.
Run Summary
Number of events processed : 10000
User=0.990000s Real=1.649343s Sys=0.050000s [Cpu=63.1%]
--------------------End of Global Run-----------------------
The run consists of 10000 gamma of 6 MeV
Cumulated dose per run, in scoring volume : 410.143 picoGy rms = 11.9778 picoGy
##############################################################################