Segmentation fault at the end of MultiThreading Process

Hi,

I’m trying to run a piece of code with multithreading which executes fine if I do not enable multithreading. As the code in question is a bit too big and convovluted to paste…I’ll try to paste a “light version” of what I’m doing (i hope it helps).

In my ActionInitialization.cc script I have a Build() and BuildForMaster() as shown below

void ActionInitialization::Build() const {
    //primary generator action
    SetUserAction(fPrimaryGeneratorAction_);

    //run action
    RunAction* runAction = new RunAction(output_file_, fixed_filename_);
    SetUserAction(runAction);

    //event action
    EventAction* eventAction = new EventAction();
    SetUserAction(eventAction);

    // prepare file
    G4AnalysisManager* fAnalysisManager = G4AnalysisManager::Instance();
    if (G4RunManager::GetRunManager()->GetRunManagerType() == G4RunManager::RMType::sequentialRM) {

      fAnalysisManager->CreateNtuple("Keyword", "Keyword");
      fAnalysisManager->CreateNtupleIColumn("RunID");                     // 0
      fAnalysisManager->CreateNtupleIColumn("NumberOfEvents");            // 1
    }

and

void ActionInitialization::BuildForMaster() const {

    SetUserAction(new RunAction(output_file_, fixed_filename_));

    G4AnalysisManager* fAnalysisManager = G4AnalysisManager::Instance();
    fAnalysisManager->SetNtupleMerging(true);

    fAnalysisManager->CreateNtuple("Keyword", "Keyword");
    fAnalysisManager->CreateNtupleIColumn("RunID");                     // 0
    fAnalysisManager->CreateNtupleIColumn("NumberOfEvents");            // 1
}

In my Run.cc I fill the tuples (no issues there). So I assume that each local thread is doing what it’s supposed to do. I also see their respective files in the local directory. However, when I try to fill the tuples in the master thread (through the “IsMaster()” boolean) I get the following warning

-------- WWWW ------- G4Exception-START -------- WWWW -------
*** G4Exception : Analysis_W011
      issued by : G4TNtupleManager::FillNtupleTColumn
      ntupleId 1 does not exist.
*** This is just a warning message. ***
-------- WWWW -------- G4Exception-END --------- WWWW -------

This is very odd because I’m “pretty sure” It was created on the same thread. Or did I misunderstand something?

Furthermore, after completing the runs, the the merging process crashes and I do not know how to resolve this. The error messages are

********************* End Simulation *********************                                                                                                                                                                                                                        
G4WT0 > Destroying WorkerRunManager (0x7f6478046f60)                                                                                                                                                                                                                              
G4WT0 > G4 kernel has come to Quit state.                                                                                                                                                                                                                                         
G4WT2 > Destroying WorkerRunManager (0x7f6474046f60)                                                                                                                                                                                                                              
G4WT2 > G4 kernel has come to Quit state.                                                                                                                                                                                                                                         
G4WT3 > Destroying WorkerRunManager (0x7f6468046f60)                                                                                                                                                                                                                              
G4WT3 > G4 kernel has come to Quit state.                                                                                                                                                                                                                                         
G4WT0 > ================== Deleting memory pools ===================                                                                                                                                                                                                              
G4WT0 > Number of memory pools allocated: 12; of which, static: 0                                                                                                                                                                                                                 
G4WT0 > Dynamic pools deleted: 12 / Total memory freed: 0.048 MB                                                                                                                                                                                                                  
G4WT0 > ============================================================                                                                                                                                                                                                              
G4WT0 > Thread-local UImanager is to be deleted.                                                                                                                                                                                                                                  
G4WT0 > There should not be any thread-local G4cout/G4cerr hereafter.                                                                                                                                                                                                             
                                                                                                                                                                                                                                                                                  
 *** Break *** segmentation violation                                                                                                                                                                                                                                             
G4WT1 > Destroying WorkerRunManager (0x7f6470046f60)                                                                                                                                                                                                                              
G4WT1 > G4 kernel has come to Quit state.                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                  
 *** Break *** segmentation violation                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                                                  
 *** Break *** segmentation violation                                                                                                                                                                                                                                             
double free or corruption (fasttop)                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                                                  
===========================================================                                                                                                                                                                                                                       
There was a crash.                                                                                                                                                                                                                                                                
This is the entire stack trace of all threads:                                                                                                                                                                                                                                    
===========================================================                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                                  
Thread 4 (Thread 0x7f647e14a700 (LWP 882721)):                                                                                                                                                                                                                                    
#0  0x00007f64825d8dff in __GI___wait4 (pid=882728, stat_loc=stat_loc                                                                                                                                                                                                             
entry=0x7f647dfcbba8, options=options                                                                                                                                                                                                                                             
entry=0, usage=usage                                                                                                                                                                                                                                                              
entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:27                                                                                                                                                                                                                               
#1  0x00007f64825d8d7b in __GI___waitpid (pid=<optimized out>, stat_loc=stat_loc                                                                                                                                                                                                  
entry=0x7f647dfcbba8, options=options                                                                                                                                                                                                                                             
entry=0) at waitpid.c:38                                                                                                                                                                                                                                                          
#2  0x00007f64825480e7 in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:172                                                                                                                                                                                       
#3  0x00007f648b0f9b1e in TUnixSystem::StackTrace() () from /home/johannes/software/root_v6.20.04/root_install/lib/libCore.so                                                                                                                                                     
#4  0x00007f648b0f683c in TUnixSystem::DispatchSignals(ESignals) () from /home/johannes/software/root_v6.20.04/root_install/lib/libCore.so                                                                                                                                        
#5  <signal handler called>                                                                                                                                                                                                                                                       
#6  0x0000000000004161 in ?? ()                                                                                                                                                                                                                                                   
#7  0x00007f6487ebf21d in G4RunManager::~G4RunManager() () from /home/johannes/software/geant4_v10.06/geant4_install/lib/libG4run.so                                                                                                                                              
#8  0x00007f6487ec9b9d in G4WorkerRunManager::~G4WorkerRunManager() () from /home/johannes/software/geant4_v10.06/geant4_install/lib/libG4run.so                                                                                                                                  
#9  0x00007f6487ed4ec9 in G4MTRunManagerKernel::StartThread(G4WorkerThread*) () from /home/johannes/software/geant4_v10.06/geant4_install/lib/libG4run.so                                                                                                                         
#10 0x00007f648294ad84 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6                                                                                                                                                                                                     
#11 0x00007f64826ee609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007f6482615293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7f647e94b700 (LWP 882720)):
#0  0x00007f64825d8dff in __GI___wait4 (pid=882786, stat_loc=stat_loc
entry=0x7f647e7ccba8, options=options
entry=0, usage=usage
entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:27
#1  0x00007f64825d8d7b in __GI___waitpid (pid=<optimized out>, stat_loc=stat_loc
entry=0x7f647e7ccba8, options=options
entry=0) at waitpid.c:38
#2  0x00007f64825480e7 in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:172
#3  0x00007f648b0f9b1e in TUnixSystem::StackTrace() () from /home/johannes/software/root_v6.20.04/root_install/lib/libCore.so
#4  0x00007f648b0f683c in TUnixSystem::DispatchSignals(ESignals) () from /home/johannes/software/root_v6.20.04/root_install/lib/libCore.so
#5  <signal handler called>
#6  0x0000000000004161 in ?? ()
#7  0x00007f6487ebf21d in G4RunManager::~G4RunManager() () from /home/johannes/software/geant4_v10.06/geant4_install/lib/libG4run.so
#8  0x00007f6487ec9b9d in G4WorkerRunManager::~G4WorkerRunManager() () from /home/johannes/software/geant4_v10.06/geant4_install/lib/libG4run.so
#9  0x00007f6487ed4ec9 in G4MTRunManagerKernel::StartThread(G4WorkerThread*) () from /home/johannes/software/geant4_v10.06/geant4_install/lib/libG4run.so
#10 0x00007f648294ad84 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#11 0x00007f64826ee609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007f6482615293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f647f14c700 (LWP 882719)):
#0  0x00007f64825d8dff in __GI___wait4 (pid=882731, stat_loc=stat_loc
entry=0x7f647efcdba8, options=options
entry=0, usage=usage
entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:27
#1  0x00007f64825d8d7b in __GI___waitpid (pid=<optimized out>, stat_loc=stat_loc
entry=0x7f647efcdba8, options=options
entry=0) at waitpid.c:38
#2  0x00007f64825480e7 in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:172
#3  0x00007f648b0f9b1e in TUnixSystem::StackTrace() () from /home/johannes/software/root_v6.20.04/root_install/lib/libCore.so
#4  0x00007f648b0f683c in TUnixSystem::DispatchSignals(ESignals) () from /home/johannes/software/root_v6.20.04/root_install/lib/libCore.so
#5  <signal handler called>
#6  0x0000000000004161 in ?? ()
#7  0x00007f6487ebf21d in G4RunManager::~G4RunManager() () from /home/johannes/software/geant4_v10.06/geant4_install/lib/libG4run.so
#8  0x00007f6487ec9b9d in G4WorkerRunManager::~G4WorkerRunManager() () from /home/johannes/software/geant4_v10.06/geant4_install/lib/libG4run.so
#9  0x00007f6487ed4ec9 in G4MTRunManagerKernel::StartThread(G4WorkerThread*) () from /home/johannes/software/geant4_v10.06/geant4_install/lib/libG4run.so
#10 0x00007f648294ad84 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#11 0x00007f64826ee609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007f6482615293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f647fbc3d00 (LWP 882680)):
#0  __pthread_clockjoin_ex (threadid=140069605525248, thread_return=0x0, clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145
#1  0x00007f648294afe7 in std::thread::join() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00007f6487ec4fd9 in G4MTRunManager::TerminateWorkers() () from /home/johannes/software/geant4_v10.06/geant4_install/lib/libG4run.so 
#3  0x00007f6487ec508e in G4MTRunManager::~G4MTRunManager() () from /home/johannes/software/geant4_v10.06/geant4_install/lib/libG4run.so
#4  0x00007f6487ec51dd in G4MTRunManager::~G4MTRunManager() () from /home/johannes/software/geant4_v10.06/geant4_install/lib/libG4run.so
#5  0x0000561e78f7bea3 in main (argc=7, argv=0x7fffc9995858) at /home/johannes/Desktop/POLAR-02/Simulations/POLAR2sim/POLAR2sim.cc:151
===========================================================


The lines below might hint at the cause of the crash.
You may get help by asking at the ROOT forum http://root.cern.ch/forum
Only if you are really convinced it is a bug in ROOT then please submit a
report at http://root.cern.ch/bugs Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#6  0x0000000000004161 in ?? ()
#7  0x00007f6487ebf21d in G4RunManager::~G4RunManager() () from /home/johannes/software/geant4_v10.06/geant4_install/lib/libG4run.so
#8  0x00007f6487ec9b9d in G4WorkerRunManager::~G4WorkerRunManager() () from /home/johannes/software/geant4_v10.06/geant4_install/lib/libG4run.so
#9  0x00007f6487ed4ec9 in G4MTRunManagerKernel::StartThread(G4WorkerThread*) () from /home/johannes/software/geant4_v10.06/geant4_install/lib/libG4run.so
#10 0x00007f648294ad84 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#11 0x00007f64826ee609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007f6482615293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

===========================================================


Aborted (core dumped)

Any ideas/tips that might have caused this?

Cheers,
Johan

Dear Johan,

I am sorry for a delay in replying.

The analysis manager is designed in the way that in MT mode the same calls to its functions should be performed on worker and on master; this will internally create all helper objects which then take care of eg. ntuple merging. The eisiest way how to achieve this is to perform all calls, except for fills, in your a run action class which is create both on master and worker.

Best regards,