-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent Lumis and DQM at HLT #28341
Comments
A new Issue was created by @schneiml Marcel Schneider. @davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign hlt, core |
New categories assigned: core,hlt @Dr15Jones,@smuzaffar,@Martin-Grunewald,@fwyzard you have been requested to review this Pull request/Issue and eventually sign? Thanks |
Is this issue still relevant? @cms-sw/dqm-l2 (I'd guess "yes") |
My understanding is that yes, it is still relevant. |
I guess the two options are
What I recall is that the only way to get per-Run histograms from the HLT to the DQM was to "slice" them into per-LS updates; if that is not needed/supported any more, or if there is a better way to do it, we should definitely consider the alternative. About the actual DQM modules running at HLT: I don't know what has change in DQM-land since 2019; maybe some of the original issues have already been addressed ? Out of my head, the requirements from the HLT side are
Not sure what are the best means to achieve them. |
I apologize in advance if my answer does not reach the desired levels for this conversation, but unfortunately the real expert on this matter @schneiml is out of CMS since a long time as you probably know. I am afraid that the original problems stated in this issue concerning DQM@HLT have not been addressed. According to the description by Marcel and the PRs that came afterwards, specially #28622 What seems to have changed since this issue was raised, is the change of DQMEDAnalyzer, so that now it is based on edm::stream since #28813 And this links with the last solution proposed by Marcel at the description of this issue, which may be the best option, migrate the DQMGlobalEDAnalyzer to DQMEDAnalyzer, at least for the HLT purpouses. But I'd like to know first the opinion of the real experts who are still alive, @makortel @Dr15Jones |
I would not trust much our memory, but what I recall and see in the code, migrating An alternative could be to add per LS histogram support to I assume the latter two points of @fwyzard (completeness of merging/accumulating from all HLT nodes, and older histograms not being overwritten by newer histograms) are more for the histogram merging infrastructure outside of CMSSW. |
Der HLT and Core Experts,
From recent discussions I hear that HLT is planning to move to concurrent lumisections by Run3. This opens new questions regarding DQM@HLT:
As I understand currently all DQM@HLT uses
DQMGlobalEDAnalyzer
based modules. These modules hold their histograms in a Run Cache.However, the histograms produced at HLT are shipped to online DQM on a per-lumisection basis. To my knowledge, this is the only way that these histograms are saved.
This works thanks to the
edm::Service<DQMStore>
, which can pull the histograms out without help form EDM. Various thread-safety concerns appear (e.g. theedm::global
DQMFileSaver
[1] sounds very dangerous; it does not take any locks! That is definitely racy in harvesting -- unless running any one-module blocks global modules from running), but it is logically sound as long as lumis are processed one by one.However, as soon as we have concurrent lumisections, this becomes incorrect, and irreproducible. We are filling the same histograms from different lumisections, and then take a (supposedly) per-lumi snapshot at some random point.
Now, there are many ways to handle that in the future:
Ignore it. The resulting histograms will be perfectly fine for monitoring, even if the lumi boundaries are a bit blurry. We will also continue to need an
edm::Service
for online DQM in the foreseeable future, and can also keep using that in HLT. [2]Migrate the
DQMGlobalEDAnalyzer
modules to use a lumi cache instead of a run cache, and only save run-based histograms at the end of the run (this most likely makes them completely unusable for DQM@HLT).Implement a mechanism to get Run-based histograms out of HLT. This can be combined with the other solutions.
Migrate the
DQMGlobalEDAnalyzer
toDQMEDAnalyzer
once we get these back toedm::stream
(this should ideally happen before CMSSW11, depending on progress with the other PRs now).I'm looking forward to any additional opinions on this aspect; it might be fine to not do anything, but we should be aware of the implications.
[1] https://cmssdt.cern.ch/dxr/CMSSW/source/DQMServices/Components/plugins/DQMFileSaver.h?from=DQMFileSaver#15
[2] Note that for online DQM we have the same problem, twice: We have modules (e.g. specific
edm::one
users) that can't save per-lumi, but need their histograms shown live in online DQM, and we have modules that can save per-lumi, but online DQM traditionally can update much faster than per-lumisection. So theedm::Service
or similar is unavoidable here, but also less of a problem, since online DQM can happily run single-threaded, no concurrent lumis.The text was updated successfully, but these errors were encountered: