Thread Utilisation

My system monitor is reporting 15% utilisation when running in multithreaded mode. I am wondering if anyone knows whether this can be accurate or if there is something to be done about this?

Do you have code in your threads which write to files each event? For example, SD’s which write diagnostic data? Does any of your code make use of mutexes to write to (or access non-const) global static objects?

In my experience, regular G4 running is extremely efficient up to about two or three threads per CPU, but my I/O code smacks it down.

I have a lot of debug statements printing to the terminal in fact from the SD. So perhaps this may be it?

Could you give an example of the objects you are thinking about? Please forgive my ignorance.

So in my experiment, we write some complex objects directly to a single ROOT file per job (i.e, we are not just using G4Analysis N-tuples). Our G4UserEventAction in each thread writes its whole event data (multiple trees, sometimes multiple rows per tree) to the same globally shared TTrees in the same globally shared TFile. That means G4UserEventAction has to use mutexes before accessing and writing to the ROOT objects.

With a regular Geant4 “source” job (throwing gammas, or radioactive decays, etc.) this blows away the thread efficiency, as each thread takes turns writing each individual event, and we see about 20% CPU utilization. When we run detector response (using the G4CMP package), each event can take tens of minutes, so the I/O is tiny and the efficiency can be over 90%.

1 Like

That is really interesting and thank you for your response. I am using G4Analysis at the moment and filling ntuples and some primitive histograms. I would like to have closer integration with ROOT not least because I would like to share the geometry files and see if I can interrogate histogram features in that way.

https://twiki.cern.ch/twiki/bin/view/Geant4/QuickMigrationGuideForGeant4V10

Looks like i need to read this section in more detail. Is there no way to buffer these inputs and write them in batches in order to circumvent the problem?

Well, ROOT does buffer the output and write it in batches. The issue is that there’s only one file, and only one set of buffers which are shared across all the worker threads. So you don’t want two threads trying to put data into the buffers simultaneously – you’ll end up with a mixture of data from the two events all jumbled together.

My understanding is that the latest ROOT version or two has a static function called “ROOT:SetUpForThreadSafety()” or something annoying like that. We developed all lot of our code at least ten years ago (under ROOT 5!) and a major rewrite is just not in the cards. So we make due with with it, and using mutexes as sparingly as possible.