-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in use of first entry and maximum number of events: too few events #269
Comments
Thanks @IzaakWN . We are checking if we can safely remove this line. |
Hi @gouskos, do we already know if this line can be safely removed (in general), or if nanoAOD-tools/python/postprocessing/framework/output.py Lines 175 to 176 in 25a793e
|
@gouskos, @AndreasAlbert When removing this line, the branches that are dropped in the "keep and drop" file via
but I am not 100% this covers all cases? |
Hi apologies for being super late to the party here. I personally do not have a great understanding of this piece of code. When I initially made the changes that introduced this bug, I was mainly copy/pasting the logic from other pieces in the code. It seems to me that just removing the line altogether would revert the goal that I initially had, as @IzaakWN points out: Making sure that the keep/drop statements are also used for branches that are created by nanoaod-tools (i.e. not existant in the original inputs yet). It seems to me that the first solution proposed by @IzaakWN is promising: Just changing the firstentry to 0. I never tested or even considered the maxEntries settings when making the original PR, so I did not realize that the tree is already prefiltered. It further seems to me that this change cannot break anything if we verify that the number of entries being processed is consistent after the change (which is what @IzaakWN seems to have done). |
Thanks for your input, @AndreasAlbert! @gouskos, as far as I can tell, the output tree is already filtered by this line in all my personal use cases. Do you know what combinations of settings we would need to test to convince ourselves this change is safe for all possible cases? To make it a bit more complicated, there is another slightly related bug that happens in the special case where the user passes no modules ( nanoAOD-tools/python/postprocessing/framework/preskimming.py Lines 81 to 84 in f00eb4e
elist that looks like [100,...,199] minus events not in JSON. Without a JSON or cut, this elist is None . If elist exists, this list is used to filter the input file:nanoAOD-tools/python/postprocessing/framework/postprocessor.py Lines 190 to 196 in f00eb4e
nanoAOD-tools/python/postprocessing/framework/output.py Lines 126 to 130 in f00eb4e
fullClone==True and the input tree was pre-filtered, you will end up with zero events in this example, because firstEntry=100>=100=maxEntries .
One way to solve this special case, is to reset
|
Hi @IzaakWN @AndreasAlbert - thanks for all the investigations. I also tried all possible variations that I could think of and indeed changing firstEntry to 0 in:
@IzaakWN I suggest you to go ahead and push this change. Concerning this:
I suggest we address it later. I did not have the chance to test the proposed change. I will create a separate issue pointing to this discussion |
The tree is already filtered just before writing, so `firstEntry>0` causes loss of events. See cms-nanoAOD#269
Thanks you for the help and merge, @gouskos. I'll close this issue and open a separate one for the |
Thanks to you @IzaakWN . Yes - please do so and we will follow-up. |
When using the postprocessor with
firstEntry
andmaxEntries
that are both larger than zero, I noticed that the output tree has no events iffirstEntry
exceedsmaxEntries
. For example when you have a nanoAOD file with originally 10000 events, and you split it up into 10 jobs withand so on, only the first setting will have 1000 events. If you do
the output tree will have 990 events and so on.
After doing some printout, it seems that this reduction happens in this line:
nanoAOD-tools/python/postprocessing/framework/output.py
Lines 175 to 176 in 25a793e
(see commit 9631936 by @AndreasAlbert) Namely, Before the copy entry,
self._tree
already hasself.maxEntries
events (see [1-2]). Ifself.firstEntry
is larger than 0, you will only copy eventsself.firstEntry
up toself.maxEntries
of this tree, so you lose the firstself.firstEntry
events.Can this line be removed without breaking other cases? If it is needed for something else (
provenance=True
?), maybeself.firstEntry
should be replace with0
in this line.[1]
fullClone
nanoAOD-tools/python/postprocessing/framework/output.py
Lines 126 to 128 in 25a793e
[2]
not fullClone
nanoAOD-tools/python/postprocessing/framework/eventloop.py
Lines 74 to 76 in 25a793e
The text was updated successfully, but these errors were encountered: