Std::vector in split classes

nmori · June 15, 2020, 10:30am

I am studying the MT architecture of Geant4, and I cannot understand this note on page 52 of the toolkit developers guide:

Note: A note on content of split classes. Data fields of the split class should have a size that is known at compile time. Thus objects like std::vector cannot be contained in split class data, but pointers to these object can.

I understand the requirement of classes having a known size, but I don’t understand why std::vector should not fulfill it. The size of std::vector actually is known and fixed: in most implementations it is simply 24 bytes, i.e. 3 pointers. So one can effectively create a std::vector<int> *vptr, instantiate it with say 10 elements and then write e.g. vptr[3].resize(5). So I’d say that a G4Splitter<std::vector<int>>, for example, should work without problems, and likely any other G4Splitter<A> where A contains a std::vector. The only data types that come to my mind that would create trouble are variable-size, C-style arrays, which however are a very bad idea on their own and better avoided at all.

So I don’t understand the meaning of the note. Is my understanding wrong, or is the note wrong? Thanks.

jrmadsen · June 15, 2020, 6:01pm

I suspect you are getting that std::vector is 24 bytes via sizeof(std::vector<int>). This is because std::vector<T> contains three data members: a pointer to the start of the memory, a pointer to the end of the currently used memory, and a pointer to the end of the currently allocated memory. std::vector is essentially is just a wrapper around a C-style array which handles dynamic allocation for you. Vectors guaranteed that T* is contiguous in memory so you cannot have G4Splitter<std::vector<int>> because growing that vector would cause the end of the allocated memory to overlap with the start of the next instances memory location. If you have pointers to vectors, those internal pointers can be moved around to accommodate the new sizes.

nmori · June 15, 2020, 9:10pm

@jrmadsen I don’t think that your argument applies to this case. The underlying C-style array is typically somewhere on the heap, at an address stored in one of the three pointers, and not in a memory area which is contiguous to the std::vector itself (i.e. the three pointers). Growing the vector, this storage area can be moved around to fit the new number of items in contiguous memory cells, but the only effect on the vector itself is to change the addresses stored in the pointers. You can get an overlap if you grow a C-style array in place, but you never get it when you resize a std::vector. So I still don’t understand the meaning of that note.

jrmadsen · June 16, 2020, 12:42am

Ah yes, I apologise. I didn’t read your question carefully and I made the mistake of assuming it had to do with the memory pool implementation. You are correct, it theoretically could work with vector but the statement is correct. I just glanced at the source and the issue lies in the implementation. Whoever wrote it was clearly a C programmer because it was written with realloc, free, and memcpy. Thus, no calls to the constructor and destructor of the vector so the dynamic allocations managed by the vector would be leaked.

The documentation for the MT interface is a bit dated though. That class isn’t extensively used from what I can tell (and if it is, it shouldn’t be) and the MT was rewritten from pthreads to use the STL wrappers a couple years ago. And that MT implementation will be phased out in favor of a more dynamic/asynchronous scheduling system via tasking in the near future.

nmori · June 16, 2020, 6:36am

Thanks for your reply. About your last statement: does it mean that the MT interface is going to change, or just that it will stay the same but the implementation will change? I’m just beginning to study it and it wouldn’t be nice to have to restart from scratch in some time from now…

jrmadsen · June 29, 2020, 4:51pm

Sorry for delay. We recently released Geant4 V10.7.beta recently with the new tasking interface available. The user interface changes are basically non-existent beyond creating G4TaskRunManager instead of G4MTRunManager. The difference internally is that threads in G4MTRunManager had a pre-defined, fixed callpath for their work statically scheduled at the beginning of a run: if one thread took much longer to process it’s G4events than all the other threads, those other threads idled until this thread finished. In the new tasking model, work is dynamically scheduled and when workload imbalance occurs, another thread can/will start processing the work in another threads queue.

nmori · June 29, 2020, 7:57pm

Thank you very much for the information, it seems quite a nice improvement.