-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GloParT inference facility #46523
Conversation
cms-bot internal usage |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46523/42384 |
A new Pull Request was created by @colizz for master. It involves the following packages:
@cmsbuild, @ftorrresd, @hqucms, @jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
enable nano |
please test |
-1 Failed Tests: UnitTests RelVals RelVals-INPUT RelVals-NANO Unit TestsI found 4 errors in the following unit tests: ---> test test_MC_22_crosscheck had ERRORS ---> test test_MC_23_crosscheck had ERRORS ---> test test_MC_22_setup had ERRORS and more ... RelVals
Expand to see more relval errors ...RelVals-INPUT
Expand to see more relval errors ...RelVals-NANO |
I think it's caused by not testing with cms-data/RecoBTag-Combined#62 |
test parameters:
|
please test |
auto cand = lts_->ptrAt(i); | ||
if (reco::deltaR(*cand, jet) < jet_radius_) { | ||
cpfPtrs.push_back(cand); | ||
isLostTrackMap[cand.key()] = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@colizz Is cand.key()
unique for lost tracks and packed candidates? Or the key()
is just the index in each collection (then different types of objects can have the same keys)? Could you please check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot @hqucms , this is very nice catch. There is an overlap between "keys" in PF candidates and lost tracks, with a prob of ~0.5% (though small, it indicates a logical error in the code).
We should point out that in training GloParT, we used the same code to generate training samples, which means that a small portion of charged PF candidates were mistakenly labelled as lost tracks. For now, it appears that this error is consistent across both, allowing our trained GloParT model to perform the same in CMSSW inference.
It seems our best option is to retain this code to preserve GloParT's performance. Perhaps we could add a comment here for clarity. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@colizz Given the small possibility of the overlap, I would expect fixing this should have a rather negligible effect on the performance? If that is the indeed case I think it's better to fix it now already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, we will check whether the effect is negligible if we change the code here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @hqucms , the code has been fixed to use CandidatePtr
itself as the map key: b813fb6
Additionally, puppi_wgt_cache
had a similar issue; however, testing revealed that it does not produce the same errors as isLostTrackMap
. In this updated code, we validated the scores and found that in <1% of cases, some charged PF cands are marked as isLostTrack=True
, but this has only a minimal impact on the GloParT output scores. We also examined the ROC curve on a large sample set, and found no change to the ROC performance. Thus we conclude that this modification has a negligible impact on GloParT performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @colizz for the fast turnaround! The fix looks good to me.
@colizz Thanks for making the PR. I think you need to add the GloParT3 branches in |
please abort |
-1 Failed Tests: UnitTests RelVals-INPUT The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
You can see more details here: Unit TestsI found 27 errors in the following unit tests: ---> test runtestPhysicsToolsPatAlgos had ERRORS ---> test test-runTheMatrix-interactive had ERRORS ---> test test_MC_22_crosscheck had ERRORS and more ... RelVals-INPUT
Expand to see more relval errors ...
Comparison SummarySummary:
NANO Comparison SummarySummary:
Nano size comparison Summary:
|
test parameters:
|
please test |
+1 Size: This PR adds an extra 16KB to repository The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic: You can see more details here: Comparison SummarySummary:
NANO Comparison SummarySummary:
Nano size comparison Summary:
|
+1 |
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @antoniovilela, @rappoccio, @mandrenguyen (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
@colizz , there is a request to backport RecoBTag-Combined tag V01-25-00 to 14.1.X. Do we need a backport of this PR to also go with cms-sw/cmsdist#9505 |
@smuzaffar No, the changes introduced in V01-25-00 (i.e., cms-data/RecoBTag-Combined#63) is unrelated to this PR. |
what about changes in cms-data/RecoBTag-Combined#62 ? cms-sw/cmsdist#9505 includes changes from both cms-data/RecoBTag-Combined#62 and cms-data/RecoBTag-Combined#63 . So in order to integrate cms-sw/cmsdist#9505 in to 14.1.X, do we need cmssw update too? |
No -- cms-data/RecoBTag-Combined#62 only adds new models. |
PR description:
This PR adds the GloParT 3 model's inference facility into CMSSW, prepared for NanoAODv15, replacing #45830.
Model performance details are provided in these [slides] (accessible via CMS).
Global Particle Transformer (GloParT) 3 is an inclusive tagging model for AK8 jets that covers the entire phase space and enables resonance mass regression for each jet class. It functions as both a global tagger and a mass regression model for AK8 jets and can also be utilized as a pre-trained model. The hidden layer neurons (with dimension 256) are stored in MiniAOD, providing the capability to resume all output scores. Further details can be found in the slides.
In summary, the integration overview (the new AK8 jet features added to MiniAOD and NanoAOD) is illustrated below.
Please test this PR with cms-data/RecoBTag-Combined#62.
PR validation:
test_globalpart_cfg.py
and is also validated under the AODSIM->MINIAOD->NANOAOD workflow.