Hi, we are currently in the process of validating a new G4 framework of ours called remage [link] and I recently started looking into the multithreading performance. To do this, I implemented a setup with a simple geometry (box of liquid argon) and started with GPS electrons and gammas of 1 MeV with no output. I scanned over a range of numbers of threads and read the runtime and event rate of each simulation.
I found that with the setup I used I got a different speedup curve than I expected (see below). I am working on a server node with a total of 128 physical cores, and each thread has a constant number of primaries, so I should get weak scaling. The runtime is measured from the start of the master runaction to its end.
For the electron simulation, I see a plateau starting at about 64 threads, and then it starts to decrease. For the gamma simulation the plateau is at the expected 128 thread mark, but the slope before is significantly smaller than the electron one. I also see a decrease here.
I suspect that the setup I am using is not quite right for the porpus yet. I start with 2e5 primaries per thread, and the simulations take about ~30sec for the electrons and ~50sec for the gammas, which might be a bit short. I found that when increasing the number of primaries from a previous setup, the electron plateau lifts a bit but the decline is still present.
More details can also be found in my issue on the repo [link].
I was wondering if someone with more G4 and multithreading experience might have an idea of what I am looking at right now and have suggestions on what to x-check. Thanks in advance!