-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Online-DQM occasionally not picking up the correct IOVs ? #45714
Comments
cms-bot internal usage |
A new Issue was created by @missirol. @Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign dqm, alca, db |
New categories assigned: dqm,alca,db @rvenditti,@syuvivida,@tjavaid,@nothingface0,@antoniovagnerini,@francescobrivio,@saumyaphor4252,@saumyaphor4252,@perrotta,@perrotta,@consuegs,@consuegs you have been requested to review this Pull request/Issue and eventually sign? Thanks |
@cms-sw/dqm-l2 @cms-sw/db-l2 @cms-sw/alca-l2 Will you try to address this issue ? |
@cms-sw/dqm-l2 @cms-sw/db-l2 @cms-sw/alca-l2 Still wondering if there will be some follow-up. Or the issue is not worth investigating ? Or should more info be provided ? |
@missirol if the online-DQM jobs consumes conditions from an older IOV I think is an issue rooted in the online-DQM jobs. If needed (and if I can) I can help debugging, but @cms-sw/dqm-l2 should pinpoint first which are those jobs, where the issue could come from, etc. |
How can we be sure that this only affects the online-DQM [*] ? Could it be that the online-DQM is just the first (and only ?) place where such an issue would be spotted ? In the cases given in the description, was anything strange noticed on the DB side and/or in the O2O logs ? (I understood in #45555 (comment) that O2O logs get eventually deleted, so maybe now it's too late to check). @cms-sw/db-l2 [*] From the description
|
There have been cases in recent weeks/months where strange discrepancies were observed in the online-DQM outputs at P5.
L1T prescales. On May-27 (2024), strange data-emulator mismatches in the L1T decisions were seen in online-DQM during run-381286 (link). The checks made at the time suggested that, while 381286 was a collisions run, the emulator plots in the online-DQM were using the L1T prescales of the trigger menu used in the previous run (which was a run with "circulating" beams, not collisions). The relevant tag is
L1TGlobalPrescalesVetosFract_Stage2v1_hlt
. No issues were seen in the O2O logs at the time, nor warnings or crashes anywhere. Below is the L1T report from the day after.This slide from L1T Techical Coordination suggests that a similar issue also occurred on Apr-25 (2024), see CMSLITDPG-1257.
Data-emulator mismatches related to ECAL Barrel trigger primitives. On Aug-15 (2024), ECAL uploaded new conditions via O2O with IOV starting from run-384485, but during that run data-emulator mismatches showed up in the ECAL online-DQM outputs. This too may be consistent with the online-DQM jobs consuming conditions from an older IOV.
Links: elog, online DQM of run-384485, relevant CMSTalk post, tag
EcalTPGLinearizationConst_v2_hlt
.Copying here text from an ECAL expert.
In both examples, the discrepancies disappeared after a new run was started.
At face value, both examples seem compatible with the
cmsRun
jobs in the online-DQM nodes not picking up the latest (and correct) IOVs, using instead older ones and thus leading to mismatches between real and emulated data in DQM outputs.I think it would be helpful if DQM and AlCa-DB could investigate what happened in these cases (O2O logs, etc), with help from framework experts if needed.
If the issue is not specific to online-DQM, but generally related to the access to the conditions database, it could potentially affect the HLT jobs running online as well.
Maybe unrelated, a recent HLT crash possibly caused by a failure in accessing correct conditions (in that case, for the beamspot) is being discussed in #45555.
The text was updated successfully, but these errors were encountered: