Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in use of first entry and maximum number of events: too few events #269

Closed
IzaakWN opened this issue Feb 8, 2021 · 9 comments
Closed

Comments

@IzaakWN
Copy link
Contributor

IzaakWN commented Feb 8, 2021

When using the postprocessor with firstEntry and maxEntries that are both larger than zero, I noticed that the output tree has no events if firstEntry exceeds maxEntries. For example when you have a nanoAOD file with originally 10000 events, and you split it up into 10 jobs with

firstEntry=0, maxEntries=1000
firstEntry=1000, maxEntries=1000
firstEntry=2000, maxEntries=1000
...

and so on, only the first setting will have 1000 events. If you do

firstEntry=10, maxEntries=1000

the output tree will have 990 events and so on.

After doing some printout, it seems that this reduction happens in this line:

self._tree = self.tree().CopyTree('1', "",
self.maxEntries if self.maxEntries else ROOT.TVirtualTreePlayer.kMaxEntries, self.firstEntry)

(see commit 9631936 by @AndreasAlbert) Namely, Before the copy entry, self._tree already has self.maxEntries events (see [1-2]). If self.firstEntry is larger than 0, you will only copy events self.firstEntry up to self.maxEntries of this tree, so you lose the first self.firstEntry events.

Can this line be removed without breaking other cases? If it is needed for something else (provenance=True?), maybe self.firstEntry should be replace with 0 in this line.

[1] fullClone

if fullClone:
outputTree = inputTree.CopyTree(
'1', "", maxEntries if maxEntries else ROOT.TVirtualTreePlayer.kMaxEntries, firstEntry)

[2] not fullClone

for ie, i in enumerate(range(entries) if eventRange == None else eventRange):
if maxEvents > 0 and ie >= maxEvents:
break

@gouskos
Copy link
Contributor

gouskos commented Feb 9, 2021

Thanks @IzaakWN . We are checking if we can safely remove this line.

@IzaakWN
Copy link
Contributor Author

IzaakWN commented Feb 26, 2021

Hi @gouskos, do we already know if this line can be safely removed (in general), or if self.firstEntry can be replaced with 0? If so, I will go ahead with it in my private branch.

self._tree = self.tree().CopyTree('1', "",
self.maxEntries if self.maxEntries else ROOT.TVirtualTreePlayer.kMaxEntries, self.firstEntry)

@gouskos
Copy link
Contributor

gouskos commented Feb 26, 2021

Hi @IzaakWN from the tests I did it looks safe to me to remove the line.
@aalbert, since you added the line, do you have any comments removing it?

@IzaakWN IzaakWN changed the title Bug in use of first entry and maximum number of events: too little events Bug in use of first entry and maximum number of events: too few events Mar 3, 2021
@IzaakWN
Copy link
Contributor Author

IzaakWN commented Mar 3, 2021

@gouskos, @AndreasAlbert When removing this line, the branches that are dropped in the "keep and drop" file via outputbranchsel are not removed, so CopyTree is actually needed to get rid of the deactivated branches. Maybe a better way to fix the bug reported above, is by changing it to

self._tree = self.tree().CopyTree('1', "", 
                                   self.maxEntries if self.maxEntries else ROOT.TVirtualTreePlayer.kMaxEntries,0)

but I am not 100% this covers all cases?

@AndreasAlbert
Copy link
Contributor

Hi apologies for being super late to the party here. I personally do not have a great understanding of this piece of code. When I initially made the changes that introduced this bug, I was mainly copy/pasting the logic from other pieces in the code. It seems to me that just removing the line altogether would revert the goal that I initially had, as @IzaakWN points out: Making sure that the keep/drop statements are also used for branches that are created by nanoaod-tools (i.e. not existant in the original inputs yet).

It seems to me that the first solution proposed by @IzaakWN is promising: Just changing the firstentry to 0. I never tested or even considered the maxEntries settings when making the original PR, so I did not realize that the tree is already prefiltered. It further seems to me that this change cannot break anything if we verify that the number of entries being processed is consistent after the change (which is what @IzaakWN seems to have done).

@IzaakWN
Copy link
Contributor Author

IzaakWN commented Mar 5, 2021

Thanks for your input, @AndreasAlbert!

@gouskos, as far as I can tell, the output tree is already filtered by this line in all my personal use cases. Do you know what combinations of settings we would need to test to convince ourselves this change is safe for all possible cases? firstEvent=0 by default, but do we know if someone is using this option at the moment?

To make it a bit more complicated, there is another slightly related bug that happens in the special case where the user passes no modules (fullClone==True), but passes a JSON file via jsonInput, and/or a cut via cut.
For sake of example, say the user passes firstEntry=100, maxEntries=100 and a JSON file, then in the preskim step,

tree.Draw('>>elist', cut, "entrylist", maxEntries, firstEntry)
elist = ROOT.gDirectory.Get('elist')
if jsonInput:
elist = jsonFilter.filterEList(tree, elist)
you will have an event list elist that looks like [100,...,199] minus events not in JSON. Without a JSON or cut, this elist is None. If elist exists, this list is used to filter the input file:
if fullClone:
# no need of a reader (no event loop), but set up the elist if available
if elist:
inTree.SetEntryList(elist)
else:
# initialize reader
inTree = InputTree(inTree, elist)
The problem is that this seems to effectively shift the event indices back to 0. Event 100 is now at index 0, event 199 is now at index 99, and so on. In this line then,
if fullClone:
outputTree = inputTree.CopyTree(
'1', "", maxEntries if maxEntries else ROOT.TVirtualTreePlayer.kMaxEntries, firstEntry)
else:
outputTree = inputTree.CloneTree(0)
if fullClone==True and the input tree was pre-filtered, you will end up with zero events in this example, because firstEntry=100>=100=maxEntries.

One way to solve this special case, is to reset firstEntry to 0 if fullClone and elist before passing it to FullOutput , e.g.

firstEntry = 0 if fullClone and elist else self.firstEntry

@gouskos
Copy link
Contributor

gouskos commented Mar 14, 2021

Hi @IzaakWN @AndreasAlbert - thanks for all the investigations. I also tried all possible variations that I could think of and indeed changing firstEntry to 0 in:

self._tree = self.tree().CopyTree('1', "", 
                                   self.maxEntries if self.maxEntries else ROOT.TVirtualTreePlayer.kMaxEntries,firstEntry)

@IzaakWN I suggest you to go ahead and push this change.

Concerning this:

To make it a bit more complicated, there is another slightly related bug that happens in the special case where the user passes no modules (fullClone==True), but passes a JSON file via jsonInput, and/or a cut via cut.

I suggest we address it later. I did not have the chance to test the proposed change. I will create a separate issue pointing to this discussion

IzaakWN added a commit to IzaakWN/nanoAOD-tools that referenced this issue Mar 16, 2021
The tree is already filtered just before writing, so `firstEntry>0` causes loss of events. See cms-nanoAOD#269
@IzaakWN
Copy link
Contributor Author

IzaakWN commented Mar 18, 2021

Thanks you for the help and merge, @gouskos. I'll close this issue and open a separate one for the fullClone and elist case.

@gouskos
Copy link
Contributor

gouskos commented Mar 18, 2021

Thanks to you @IzaakWN . Yes - please do so and we will follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants