TL;DR: is anyone in the Geant4 community still using the IgProf profiler? If so, are you able to use it successfully to profile multithreaded G4 jobs?
Our experiment’s simulation is running under Geant4 10.7.p04. We are running simulations including a detector response simulation of phonons and charge carriers via the G4CMP library, so individual events typically take anywhere from 15 minutes to an hour (depending on voltage bias). So we run multithreaded, and it works very well.
Except that once we get above about 20 threads, there’s a lot of overhead that I don’t understand. The System CPU, and hence the elapsed wall-clock time for job, grows linearly with the number of threads above 20. By the time we get to 40 threads, events can take six to eight hours each! That’s obviously untenable.
I presume the problem is somewhere in our simulation framework, not in G4. I’d like to use IgProf to evaluate where the problem is going, but the only stuff I see from it is the master thread, up to /run/initialize. Nothing from any of the worker threads.
As an aside. I can’t seem to use Valgrind for this: it complains almost immediately about an “illegal instrudtion”, and aborts. I only run into this with G4 applications on our compute cluster. But that’s why I’m trying to use IgProf.