Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store number of simulated events #191

Open
gipert opened this issue Dec 9, 2024 · 4 comments
Open

Store number of simulated events #191

gipert opened this issue Dec 9, 2024 · 4 comments
Labels
discussion Further information is requested output Output Schemes

Comments

@gipert
Copy link
Member

gipert commented Dec 9, 2024

Storing the number of simulated events is crucial for post-processing. At the moment one could look for unique evtids in the vertices table, but this require some computation.

evtids = lh5.read_as("stp/vertices/evtid", "output.lh5", "np")
n_g4ev = len(np.unique(evtids))

Why not then just store a Scalar with the number of simulated events?

@gipert gipert added discussion Further information is requested output Output Schemes labels Dec 9, 2024
@tdixon97
Copy link
Collaborator

tdixon97 commented Dec 9, 2024

I think its already helpful for post-processing, but maybe we need to even store a little bit more information, ie the tcm like grouping as we discussed?

@EricMEsch
Copy link
Contributor

EricMEsch commented Dec 9, 2024

Vertices should already contain one unique entry per event (at least given if one event is represented by one primary. I guess this will be different for multiple primaries per event). I am not sure about the .lh5 format, but if you use the .hdf5 files there already is an entry that stores the amount of entries: f["hit"]["vertices"]["evtid"]["entries"] . Which means the number exists and is stored somewhere. I am sure .lh5 has something similar.

In case there are multiple primaries per event the last entry of evtids should correspond to the number of simulated events (-1), because primary vertices will always be stored. That should at least be faster than len(np.unique()) i assume.

@ManuelHu
Copy link
Collaborator

ManuelHu commented Dec 9, 2024

I am not sure about the .lh5 format, but if you use the .hdf5 files there already is an entry that stores the amount of entries: f["hit"]["vertices"]["evtid"]["entries"] . Which means the number exists and is stored somewhere. I am sure .lh5 has something similar.

no, those entries are removed when converting to LH5.

In case there are multiple primaries per event the last entry of evtids should correspond to the number of simulated events (-1), because primary vertices will always be stored. That should at least be faster than len(np.unique()) i assume.

not in the case of multithreading. There the distribution of event ids between threads is - unfortunately - quite complex.

@gipert
Copy link
Member Author

gipert commented Dec 9, 2024

ie the tcm like grouping as we discussed?

maybe, we need to think about whether it's the right place. But let's discuss this in legend-exp/reboost#16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Further information is requested output Output Schemes
Projects
None yet
Development

No branches or pull requests

4 participants