-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip invalid channels in the GPU HCAL RecHit producer #39738
Conversation
type bugfix |
assign hcal-dpg |
enable gpu |
please test |
urgent Relatively urgent, as the 12.4.x backport should be deployed online ASAP. |
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39738/32582
Code check has found code style and quality issues which could be resolved by applying following patch(s)
|
5119223
to
9419cbe
Compare
please test |
During the conversion from SoA to legacy format, skip bad channels, identified by a negative chi².
9419cbe
to
ceec45f
Compare
please test |
|
in fact, if it works we should probably replace |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-683156/28285/summary.html Comparison Summary@slava77 comparisons for the following workflows were not done due to missing matrix map:
Summary:
GPU Comparison Summary@slava77 comparisons for the following workflows were not done due to missing matrix map:
Summary:
|
@cms-sw/hcal-dpg-l2 , what is your take on this PR? It would be good to converge soon, to fix the HLT crashes reported in #39693. |
The GPU vs CPU energy comparisons show some differences in HE at low energy [1], but it could be just an effect of the different number of entries [2]. From [2], there should be also differences in HB, but they are not visible in [1] [1] https://tinyurl.com/2ky3szdr |
@cms-sw/hcal-dpg-l2, are there any plots that show the rechits reconstructed only on CPU or only on GPU ? Also, did the |
Investigating now.
|
@fwyzard @clacaputo @missirol I did a (standalone) comparison of the CPU-GPU reco in the MC-ttbar events wf 11634.0 (starting from the same digi). So in the good cases where the soi and TS are well defined (always true in MC), this PR doesn't introduce any change. From hw post of view, wrong soi and wrong #TS should not happen and the OPS team at P5 is looking into it. |
+reconstruction
|
@cms-sw/hcal-dpg-l2 , just a reminder that your signature is required in this PR (I guess ORP is waiting for that). |
@mariadalfonso can I ask you a couple of things? I'm sorry but I don't know where to look myself :-(
|
maybe answering myself: I see that the JR comparison does not run on the |
@mariadalfonso can you explicitly say "+1" so the bot picks up your approval? |
(.. I think the signature needs to come from a member of https://github.com/orgs/cms-sw/teams/hcal-dpg-l2/members .) |
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2) |
+1
|
PR description:
This PR consistes of two parts:
See h#39693 for a discussion on the underlying issues.
PR validation:
Tested on top of CMSSW_12_4_10 over the events that caused the HLT to crash in runs 357898, 359998, and 360295.
Without these changes the original crash can be reproduced in multiple files:
With these changes the HLT jobs run to completion on almost all those files (barring one that is affected by a different, ECAL-related crash).
If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:
To be backported to CMSSW 12.4.x and 12.5.x for data taking.