Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sorting index for lost tracks in UnifiedParticleTransformer producer #45689

Merged
merged 2 commits into from
Aug 14, 2024

Conversation

stahlleiton
Copy link
Contributor

@stahlleiton stahlleiton commented Aug 12, 2024

PR description:

This PR fixes an issue found in the UnifiedParticleTransformer producer, that relates to the index used to access the sorting list of lost tracks. The indices of the lt_sortedindices vector correspond to the indices of the original collection (e.g. LTs collection), however in the second for loop, the lt_sortedindices is accessed using the index of the lt_sorted vector (which has been trimmed and is smaller than the LTs collection). To get back the correct index to access the elements of the lt_sortedindices one should use the "get()" function from the lt_sorted vector, which returns the index of the LTs collection.

@SWuchterl @DickyChant

PR validation:

Tested locally by comparing the features produced by the CMSSW producer to those used by the b-hive inference.

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 12, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @stahlleiton for master.

It involves the following packages:

  • RecoBTag/FeatureTools (reconstruction)

@cmsbuild, @jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks.
@AlexDeMoor, @Ming-Yan, @Senphy, @andrzejnovak, @castaned, @demuller, @hqucms, @missirol this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@mandrenguyen
Copy link
Contributor

type bug-fix

@mandrenguyen
Copy link
Contributor

please test

@stahlleiton
Copy link
Contributor Author

Since the unified particle transformer model has already been trained for 2024 pp, I made a commit to add a flag to enable the bug fix. This flag is set to false by default to avoid affecting central production.

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Pull request #45689 was updated. @cmsbuild, @jfernan2, @mandrenguyen can you please check and sign again.

@mandrenguyen
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0065ce/40884/summary.html
COMMIT: a5dd705
CMSSW: CMSSW_14_1_X_2024-08-12-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/45689/40884/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 11 differences found in the comparisons
  • DQMHistoTests: Total files compared: 45
  • DQMHistoTests: Total histograms compared: 3422822
  • DQMHistoTests: Total failures: 12
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3422790
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 44 files compared)
  • Checked 196 log files, 165 edm output root files, 45 DQM output files
  • TriggerResults: no differences found

@AlexDeMoor
Copy link
Contributor

Would it be possible to have documentation regarding this bugfix? More specifically, it would be very useful to understand this one and its impact.

Also, the same ordering is used by the cpf and npf candidates from most of the btagger since DeepJet

@stahlleiton
Copy link
Contributor Author

stahlleiton commented Aug 13, 2024

Would it be possible to have documentation regarding this bugfix? More specifically, it would be very useful to understand this one and its impact.

Also, the same ordering is used by the cpf and npf candidates from most of the btagger since DeepJet

The bug is not present for the charged and neutral PF candidates, since in those two cases, the second for loop (where the sorted list of indices (c_sortedindices and n_sortedindices) are used) is looped over the same collection (jet constituents) as the for loop that filled the sorted vector. So the indices are consistent.

However, in the case of lost tracks, the second for loop is done over a different collection (sorted vector) than the one used to fill the sorted vector (lost track collection), which introduces the issue (wrong index used to access the sorted list of indices (lt_sortedindices)).

The main consequence of using the wrong index when accessing the sorted list of indices is that one can either get the default value returned (i.e. index 0) or a wrong value back, so the lost tracks are either assigned to entry 0 of the features vector multiple times (thus it stores less lost tracks in the features vector than elements in the sorted vector) or to another position in the features vector where they should not.

So since the lost tracks are only used in the unified particle transformer model (as far as I could see), the other models are not affected by this issue.

@mandrenguyen
Copy link
Contributor

@AlexDeMoor Are you satisfied with this explanation/solution?

@mandrenguyen
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will be automatically merged.

@cmsbuild cmsbuild merged commit a961243 into cms-sw:master Aug 14, 2024
11 checks passed
@AlexDeMoor
Copy link
Contributor

We have followed the discussion offline about some details of this bugfix and the potential impact of it. For now, everything is ok on the tagger side and we will have further evaluation later. This bugfix is not affecting UParT for the 2024 data giving us time to ensure the lost tracks are totally ok.

@mandrenguyen
Copy link
Contributor

type btv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants