-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix sorting index for lost tracks in UnifiedParticleTransformer producer #45689
Conversation
cms-bot internal usage |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45689/41251 |
A new Pull Request was created by @stahlleiton for master. It involves the following packages:
@cmsbuild, @jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
type bug-fix |
please test |
Since the unified particle transformer model has already been trained for 2024 pp, I made a commit to add a flag to enable the bug fix. This flag is set to false by default to avoid affecting central production. |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45689/41255 |
Pull request #45689 was updated. @cmsbuild, @jfernan2, @mandrenguyen can you please check and sign again. |
please test |
+1 Size: This PR adds an extra 20KB to repository Comparison SummarySummary:
|
Would it be possible to have documentation regarding this bugfix? More specifically, it would be very useful to understand this one and its impact. Also, the same ordering is used by the cpf and npf candidates from most of the btagger since DeepJet |
The bug is not present for the charged and neutral PF candidates, since in those two cases, the second for loop (where the sorted list of indices (c_sortedindices and n_sortedindices) are used) is looped over the same collection (jet constituents) as the for loop that filled the sorted vector. So the indices are consistent. However, in the case of lost tracks, the second for loop is done over a different collection (sorted vector) than the one used to fill the sorted vector (lost track collection), which introduces the issue (wrong index used to access the sorted list of indices (lt_sortedindices)). The main consequence of using the wrong index when accessing the sorted list of indices is that one can either get the default value returned (i.e. index 0) or a wrong value back, so the lost tracks are either assigned to entry 0 of the features vector multiple times (thus it stores less lost tracks in the features vector than elements in the sorted vector) or to another position in the features vector where they should not. So since the lost tracks are only used in the unified particle transformer model (as far as I could see), the other models are not affected by this issue. |
@AlexDeMoor Are you satisfied with this explanation/solution? |
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will be automatically merged. |
We have followed the discussion offline about some details of this bugfix and the potential impact of it. For now, everything is ok on the tagger side and we will have further evaluation later. This bugfix is not affecting UParT for the 2024 data giving us time to ensure the lost tracks are totally ok. |
type btv |
PR description:
This PR fixes an issue found in the UnifiedParticleTransformer producer, that relates to the index used to access the sorting list of lost tracks. The indices of the lt_sortedindices vector correspond to the indices of the original collection (e.g. LTs collection), however in the second for loop, the lt_sortedindices is accessed using the index of the lt_sorted vector (which has been trimmed and is smaller than the LTs collection). To get back the correct index to access the elements of the lt_sortedindices one should use the "get()" function from the lt_sorted vector, which returns the index of the LTs collection.
@SWuchterl @DickyChant
PR validation:
Tested locally by comparing the features produced by the CMSSW producer to those used by the b-hive inference.
If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for: