Support more complex N-tuple branch structures?

mkelsey · December 20, 2022, 6:51pm

I am involved in the simulation software for two different experiments. Both of them use ROOT directly for output, and we’ve had to introduce mutexes, split classes, singletons, and all the other paraphernalia needed to protect ROOT against multithreaded applications.

Since G4Analysis supports full G4 multithreading, it seemed natural to try to replace our existing ROOT-based output with G4Analysis. Unfortunately, there are some missing features. In one simulation, we have a number of array branches with variable dimension. In the other, we have one branch which is a vector of vectors of ints. Neither of those constructs are currently supported by G4Analysis.

Do either of those sound like they could be supportable in G4Analysis? Or are they too ROOT-specific?

gybarran · January 21, 2023, 9:56am

Well, for the vector-vector-int case, with what exists now, what you can do is just use a vector-int column/branch and arrange to store for each entry the set of (length, data) of each sub vector-ints. (This would work as long as each sub-vector-int size fit in an int!). Then at read time, arrange you code to create back, from the vector-int column/branch, a vector-vector-int from the set of (length, data)s of each vector-int entry.

Else, for the moment, we do not intent to handle more complicate cases than what we have now (that works in multi-threads and for various IO formats, which had been not so easy to have!). But we can understand your need which is, from what I understand, probably nothing more than being able to handle an OO event model and have a rather general and efficient IO system able to handle it (in particular now in a multi-thread context)! I am afraid that it is out of scope for us… This is a far more general problem that, I think, is still “on the table”.

Just thinking… Well the g4tools ntuple for the various binary formats (.root, but also .hdf5) can handle a column/branch of vector-char; what we can do is perhaps make this visible in the G4Analsyis API. This would permit for a user having at hand its own write/read object streaming (in a buffer of chars) to store this in such a ntuple column/branch… (From our side this would consist to handle a “blob logic”). Without having to introduce complicate things as storing class description, class version, etc… (as what is done in ROOT/IO for a general object), this would probably permit to cover, in a simple way, a lot of user cases…

mkelsey · January 21, 2023, 5:01pm

Thank you for the thoughtful reply, Guy! Your idea of packing the subvectors would likely not work in general – the particular situation was storing a vector of tracks, where the subvector was a list of all the daughter track IDs (i.e., the whole event topology!). I sort of expected that the general case would be difficult at best (how to do a vector of vectors in a CSV file?!?), and certainly not worth the trouble.

If we were designing the output data structure ourselves, we would do it differently, using multiple N-tuples (one for tracks, one for hits, etc.) rather than one giant block. As the original I/O class had been written to use ROOT directly, we decided to keep it, and just make it a global singleton with mutexes in each of the functions that do ROOT calls. That way, we preserved the output that the rest of the collaboration is used to.

system · January 28, 2023, 5:01pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.