Double delete in MT with ParticleChange and TouchableHandle

Please fill out the following information to help in answering your question, and also see tips for posting code snippets. If you don’t provide this information it will take more time to help with your problem!

Geant4 Version: 10.07.p04, 11.4.1
Operating System: RHEL8
Compiler/Version: GCC 12.2.0
CMake Version: 3.24.3


We have a custom G4CMPParticleChangeForPhonon class, which follows the model in G4ParticleChangeForTransport. We have a G4TouchableHandle object as a data member. During our tracking, this Handle is initialized to the touchable associated with the track, may be re-filled via a Propose() function, and is then reset to empty (= 0) at the end of UpdateStepForPostStep(). This works properly for tracking, with no memory leaks or other issues.

However, at the end of the job – after all the worker threads have already been deleted – we get an occasional and hard to reproduce segmentation fault complaining that the Handle’s Release() function is doing something bad. I ran using -fsanitize=thread, and get a consistent report about accessing data after it’s been deleted:

==================
WARNING: ThreadSanitizer: heap-use-after-free (pid=195009)
  Read of size 4 at 0x7b61b888a460 by main thread (mutexes: write M18605):
    #0 G4CountedObject<G4VTouchable>::Release() /scratch/group/mitchcomp/eb/x86_
64/sw/Geant4/10.7.4-foss-2022b-debug/include/Geant4/G4ReferenceCountedHandle.hh:
176 (libG4cmp.so+0xcd10e)
    #1 G4ReferenceCountedHandle<G4VTouchable>::~G4ReferenceCountedHandle() /scra
tch/group/mitchcomp/eb/x86_64/sw/Geant4/10.7.4-foss-2022b-debug/include/Geant4/G
4ReferenceCountedHandle.hh:215 (libG4cmp.so+0xcd232)
    #2 G4CMPParticleChangeForPhonon::~G4CMPParticleChangeForPhonon() include/G4C
MPParticleChangeForPhonon.hh:28 (libG4cmp.so+0xcd232)
    #3 G4CMPPhononBoundaryProcess::~G4CMPPhononBoundaryProcess() src/G4CMPPhonon
BoundaryProcess.cc:101 (libG4cmp.so+0xd488c)
    #4 G4CMPPhononBoundaryProcess::~G4CMPPhononBoundaryProcess() src/G4CMPPhonon
BoundaryProcess.cc:101 (libG4cmp.so+0xd4948)
    #5 G4ProcessTable::~G4ProcessTable() <null> (libG4processes.so+0x100c8c2)

  Previous write of size 8 at 0x7b61b888a460 by thread T2:
    [failed to restore the stack]

  Mutex M18605 (0x7fe2f3914a68) created at:
    #0 pthread_mutex_lock ../../../../libsanitizer/sanitizer_common/sanitizer_co
mmon_interceptors.inc:4324 (libtsan.so.2+0x5a471)
    #1 G4ThreadLocalSingleton<G4ProcessTable>::Register(G4ProcessTable*) const <
null> (libG4processes.so+0x100f7c4)
    #2 SuperSim_Main::SuperSim_Main() /scratch/user/kelsey/software/supersim/CDM
Sapps/SuperSim_Main.cc:64 (libCDMSapps.so+0x8a72)
    #3 main /scratch/user/kelsey/software/supersim/CDMSapps/CDMS_G4DMC.cc:28 (CD
MS_G4DMC+0x416a76)

  Thread T2 (tid=195057, finished) created by main thread at:
    #0 pthread_create ../../../../libsanitizer/tsan/tsan_interceptors_posix.cpp:
1001 (libtsan.so.2+0x62b86)
    #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::de
fault_delete<std::thread::_State> >, void (*)()) /tmp/baum/easybuild/GCCcore/12.
2.0/system-system/gcc-12.2.0/stage3_obj/x86_64-pc-linux-gnu/libstdc++-v3/include
/x86_64-pc-linux-gnu/bits/gthr-default.h:663 (libstdc++.so.6+0xe0ddb)
    #2 G4RunManager::BeamOn(int, char const*, int) <null> (libG4run.so+0x483a3)
    #3 SuperSim_Main::Run(int, char**) /scratch/user/kelsey/software/supersim/CD
MSapps/SuperSim_Main.cc:134 (libCDMSapps.so+0x8993)
    #4 main /scratch/user/kelsey/software/supersim/CDMSapps/CDMS_G4DMC.cc:39 (CD
MS_G4DMC+0x416b29)

SUMMARY: ThreadSanitizer: heap-use-after-free /scratch/group/mitchcomp/eb/x86_64
/sw/Geant4/10.7.4-foss-2022b-debug/include/Geant4/G4ReferenceCountedHandle.hh:17
6 in G4CountedObject<G4VTouchable>::Release()
==================

The main thing to notice is that the WARNING is coming from the main thread, not from one of the worker threads. The thing about this is that (as I understand things) every worker thread gets it’s own process instances, so they don’t collide. The processes which got created on the main (master) thread are never actually invoked. So the instance of G4CMPPhononBoundaryProcess on the master thread should still be in its initial state.

Next, notice that the “previous write” is reported to have come from thread T2, by way of a mutex. Since T2 has already been deleted, the traceback for that write is gone. But why would T2 have been writing back into a TouchableHandle on the master thread? Shouldn’t each worker thread have their own local objects?

The mutex itself is reported to have come from a thread-local singleton of G4ProcessTable, presumably by thread T2? But why should a G4ThreadLocalSingleton need a mutex to write into itself or a thread-local object it owns?

Other than G4ParticleChangeForTransport, which is what we followed when writing our own PC, are there any other examples we can look at to understand what we’re doing wrong, and what we should be doing differently?

I modified our {{G4CMPParticleChangeForPhonon}} constructor, so that it now has an explicit

   theTouchableHandle = 0;

(and nothing else). The destructor is still a no-op. Now the thread-sanitizer code says there’s nothing wrong at all, no double-delete, no access after delete, no errors or warning in my whole job.

Even though the problem has gone away, I still don’t understand what’s happening. I’d be grateful if someone could talk through (or point me at a TWiki page?) that explains how the {{G4TouchableHandle}} works behind the scenes?

I was wrong about the problem going away (I had run from a different clone which wasn’t built with thread-sanitizing). I continue to get the end-of-job complaint (and other users see intermittent segfaults) no matter how much I try to pre-clear the local TouchableHandle data member.

This continues to be an insoluble problem for us, and it continues to cause end-of-job segmentation faults for many of my collaborators. I’d be very grateful to engage either @gcosmo or @japost to try to understand how we should be writing a ParticleChange subclass which allows changing the TouchableHandle of the post-step point.