Multithreading does not result in expected speedup for your application

MNeuberger · March 14, 2025, 10:00am

Hi, we are currently in the process of validating a new G4 framework of ours called remage [link] and I recently started looking into the multithreading performance. To do this, I implemented a setup with a simple geometry (box of liquid argon) and started with GPS electrons and gammas of 1 MeV with no output. I scanned over a range of numbers of threads and read the runtime and event rate of each simulation.

I found that with the setup I used I got a different speedup curve than I expected (see below). I am working on a server node with a total of 128 physical cores, and each thread has a constant number of primaries, so I should get weak scaling. The runtime is measured from the start of the master runaction to its end.

For the electron simulation, I see a plateau starting at about 64 threads, and then it starts to decrease. For the gamma simulation the plateau is at the expected 128 thread mark, but the slope before is significantly smaller than the electron one. I also see a decrease here.

I suspect that the setup I am using is not quite right for the porpus yet. I start with 2e5 primaries per thread, and the simulations take about ~30sec for the electrons and ~50sec for the gammas, which might be a bit short. I found that when increasing the number of primaries from a previous setup, the electron plateau lifts a bit but the decline is still present.

More details can also be found in my issue on the repo [link].

I was wondering if someone with more G4 and multithreading experience might have an idea of what I am looking at right now and have suggestions on what to x-check. Thanks in advance!

bmorgan · March 17, 2025, 9:59am

Pinging @makotoasai on this (I’ll also try and take a look when I get a moment).

MNeuberger · May 1, 2025, 8:55am

Hey, I was wondering if you (@bmorgan) or @makotoasai have had any ideas on the possible cause of this lack of speedup.

bmorgan · May 1, 2025, 12:39pm

Hi @MNeuberger, sorry for the delay in looking at this - I will have some time over the new couple of weeks, so will see if I can get your test case built and running locally. I don’t have such a high core count machine though, so do you see the same behaviour at lower core count?

gipert · May 7, 2025, 10:26am

Hi @bmorgan, would be great if we could have some expert eyes looking at this! I’d really like to have this feature functional in remage for LEGEND large scale simulations.

About the number of cores: what is quite unexpected is the slope of the linear increase, and that you can already see with few threads