-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HLT crash in run 359998: Unavailable Conditions of type HcalChannelQuality #39693
Comments
A new Issue was created by @trtomei Thiago Tomei. @Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
From the |
assign hcal-dpg |
FYI: @cms-sw/hlt-l2 @silviodonato |
New categories assigned: hcal-dpg @wang-hui,@georgia14,@igv4321 you have been requested to review this Pull request/Issue and eventually sign? Thanks |
For the record, this online crash happened more than once (and it does not seem to be reproducible offline). Affected runs (afaik):
|
@trtomei , please update the title of the issue with something like "HLT crash in run 359998: ...". |
Hi @trtomei Could you please copy the config file to lxplus so that we HCAL DPG can try to reproduce the crash? |
Hi @wang-hui The files are available in |
Hello @cms-sw/hcal-dpg-l2 @cms-sw/alca-l2 , this crash happened again this night in run 360295. HLT was using
f3mon_logtable_2022-10-13T07_53_34.976Z.txt List of runs with the crashes:
|
In Run 360330
|
this particular event was investigated by @wang-hui
There is a shift in the SOI, I do not see this condition in the the hlt-gpu code so that's why crash on this. Of course we need understand why the electronics thinks this rec-hit is shifted of 25ns ! |
Hi where I can find the events here ? |
They are available on the online GPU-development machines, e.g.
For an example of how to rerun HLT directly on
FYI: @cms-sw/heterogeneous-l2 |
Seems to be happening a lot more frequently in recent runs:
|
assign heterogeneous |
Just to add a bit of information on "recent runs": Since the crashes reported in this GH issue pre-date the errors I just described, I think the two things might be un-related, but I just wanted to add the information for completeness. |
One thing that I can reproduce is that the
and
|
Looking at the legacy code in
Now the question is - how do we skip a "bad" channel in the GPU reconstruction ? |
By the way, a source of problems is that |
We discussed this issue in today's HCAL DPG meeting. |
OK. In the meantime, I've prepared what I think is a fix to skip the channels affected by this problem, trying to follow the same approach used in the legacy rechit reconstruction: #39738 . |
+heterogeneous |
The same error has been reported in runs 360393 and 360400. Running with the candidate fix from #39740 lets all HLT jobs complete, with some HCAL-related messages:
|
@cms-sw/hcal-dpg-l2 please consider signing this issue. |
@wang-hui then please sign-off this issue. |
+1 |
This issue is fully signed and ready to be closed. |
please close |
Crash in Run 359998
http://cmsonline.cern.ch/cms-elog/1159020
with following message:
Unfortunately not reproducible yet. The file reconverted to ROOT is
/nfshome0/hltpro/hilton_c2e36_35_04/hltpro/thiagoScratch/run359998_ls0335.root
,and the relevant configurations are:
A copy of the configuration file is available in /nfshome0/hltpro/hilton_c2e36_35_04/hltpro/thiagoScratch/hlt.py
The text was updated successfully, but these errors were encountered: