-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DQM: new DQMStore. #28622
DQM: new DQMStore. #28622
Conversation
The code-checks are being triggered in jenkins. |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28622/13150
|
A new Pull Request was created by @schneiml (Marcel Schneider) for master. It involves the following packages: CalibTracker/SiStripChannelGain The following packages do not have a category, yet: DQMServices/Demo @SiewYan, @andrius-k, @schneiml, @kpedro88, @Martin-Grunewald, @rekovic, @fioriNTU, @tlampen, @alberto-sanchez, @pohsun, @santocch, @peruzzim, @civanch, @cmsbuild, @agrohsje, @fwyzard, @smuzaffar, @Dr15Jones, @efeyazgan, @mdhildreth, @jfernan2, @tocheng, @qliphy, @benkrikler, @mkirsano, @kmaeshima, @christopheralanwest, @franzoni, @fgolf can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
please test It would be surprising if this actually passed. |
The tests are being triggered in jenkins. |
@@ -174,6 +173,7 @@ void AlcaBeamMonitor::bookHistograms(DQMStore::IBooker& ibooker, edm::Run const& | |||
} | |||
} | |||
ibooker.setCurrentFolder(monitorName_ + "Service"); | |||
auto scope = ibooker.setScope(MonitorElementData::Scope::LUMI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The better C++ way to do such behavior is to use RAII (Resource Acquisition Is Initialization - resource release is destruction).
{
DQMScopeContext scope{ ibooker, MonitorElementData::Scope::LUMI };
theValuesContainer_ = ...
}
at the end of the block scope, the scope
object is destroyed and it resets the ibooker scope back to what it started at.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was not exactly how I expected this API to be used, but it turned out to be a common pattern for migrating the existing code. I might add a helper like this.
-1 Tested at: b9acc66 CMSSW: CMSSW_11_1_X_2019-12-13-1100 I found follow errors while testing this PR Failed tests: ClangBuild
I found compilation warning while trying to compile with clang. Command used:
See details on the summary page. |
Comparison not run due to Build errors/Fireworks only changes/No short matrix requested (RelVals and Igprof tests were also skipped) |
I've checked the ROOT version in 11_1_0_pre2 and 11_1_0_pre3, by running therefore, I think this issue is not related to the ROOT version |
@mtosi which is "the usual script" you are running? |
ciao the printout I quoted in my comment comes from that part, playing w/ the thanks for your help ! [1] |
Hi Mia, after running the crystal ball for a while (I haven't really tried executing the script yet), maybe this is related: https://github.com/cms-sw/cmssw/pull/28622/files#diff-8e94a26c1f038018d957101ae048c7c5L228-R275 In theory that should make bare ROOT respect the efficiency flag we had in DQM since a long time. |
thanks @schneiml |
Try resetting that bit in your Python code. There is a good chance that these plots have the efficiency flag in CMSSW set, and that is saved into the ROOT object now, and that that causes the different behaviour. Now, if this is actually what we want is not that clear, but for a workaround, un-setting that bit in your plotting code should work (and prove that this is actually the cause, which I am not sure about). |
void setEfficiencyFlag() { | ||
auto access = this->accessMut(); | ||
if (access.value.object_) | ||
access.value.object_->SetBit(TH1::kIsAverage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtosi Looks like Github screwed up that link before. This is the relevant line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, but I'm sorry, I'm still confused
the single histogram used by the stacked plot is an efficiency, right,
but it comes from
https://github.com/cms-sw/cmssw/blob/CMSSW_11_1_0_pre3/Validation/RecoTrack/python/PostProcessorTracker_cfi.py#L21
which is making use of the DQMGenericClient
what am I missing ?
thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forget, I missed your message above
and indeed, I'm trying to rollback this bit, but I'm not able to :(
how should this be done ?
thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
th1.SetBit(ROOT.kIsAverage, 0)
or th1.ResetBit(ROOT.kIsAverage)
would be my guess in Python. If you have trouble finding the constant, you can just use the value directly (1 << 18
).
Ok, maybe consider asking in the ROOT forum if this is the expected behaviour. We could still roll back this change and go back to our own definition of the efficiency flag if there are systematic problems, but that should be pretty quickly (given that we have a sort-of-final 11_1 now and this code is actually used for large-scale productions now). |
ciao why do we move our MEs to make use of that setting is even another point, and you (DQM core) have decided that --I admit my ignorance on the reason :( -- about having this "feature" in 11_1_0, but a rollback in 11_2 and even 11_1_X I do not see a major problem, because as far as I understood it affects the DQM/Validation only, and I think 11_1_0 will be used for the offline reconstruction of the samples for the HLT TDR I managed to find a --very ugly-- solution to our issue, therefore I'm not feeling to push towards a rollback, said that thanks a lot for the support ! it was really helpful in finding the source of the issue |
Regarding the "why" of that change: In the past, we had the TH1 and the "efficiency flag" stored separately. That meant anything handling DQM MEs had to manually check the efficiency flag and adapt it's behaviour accordingly (e.g. According to the docs, setting this bit should make ROOT do the right thing by default, simplifying things downstream (also the Your corner case is odd, because you have an efficiency, but due to how tracking works, summing up efficiencies makes sense in your application. I think it is fair that you have to adjust your code here since in the common case (e.g. summing up Lumisections, or runs, or datasets) the new behaviour is correct: 90% efficiency in 7 runs is 90% efficiency, not 630% efficiency. Only in the special case of tracking, 10% efficiency in each of 7 steps is actually 70% efficiency (and not 0.00001%, which would be another common interpretation...). |
There is nothing special about tracking in these plots - the same behaviour applies whenever you have some kind of "iterative" reconstruction, or the like. For example, we could make the same plots for the HLT muon efficiency (which use 3 iterations based on outside-in, inside-out, inside-out based on L1), etc. |
uhmmm, averaging the efficiency through different LSs is different than summing the efficiency from independent categories |
Yes, that is the fundamental issue. But ROOT only offers one "hadd" (to my knowlegde?) and that one is predominantly used to assemle complete statistics from partial histograms (to my knowledge, again). IMO we should just make sure that this bit can be easily reset to change the behavior to what you need, and according to the docs that should absolutely work, so this feels like a ROOT issue to me. |
can the DQM core handle this, please ? |
@pcanal could you please help on this task ? |
@mtosi , for what I can understand, this is a buggy behavior in ROOT, and we don't really want to fix a root bug through CMSSW. I think there are multiple ways to solve your issue, the more straightforward I can see is to compute the efficiiencies of the various steps summing up the numerators of the steps before, and then doing a simple Draw("same") with any need of a THStack. This would require minor changes to the script you linked. |
Could someone summarize a reproducer for the bug? Thx
On Jun 29, 2020, at 2:58 PM, fioriNTU <[email protected]<mailto:[email protected]>> wrote:
@mtosi<https://github.com/mtosi> , for what I can understand, this is a buggy behavior in ROOT, and we don't really want to fix a root bug through CMSSW. I think there are multiple ways to solve your issue, the more straightforward I can see is to compute the efficiiencies of the various steps summing up the numerators of the steps before, and then doing a simple Draw("same") with any need of a THStack. This would require minor changes to the script you linked.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#28622 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABGPFQ4VX56J7D62CV5XKLDRZCFXNANCNFSM4J2N7I7A>.
|
@davidlange6 There is not really a bug, more of a feature. DQM (since ~forever) has a mechanism called "Efficiency Flag", that is used to mark histograms that are not histograms so DQMGUI can handle them correctly. This PR sets This now breaks a downstream script that relied on According to @mtosi it was not that easy to just |
@schneiml SORRY sorry the noise, I'm going to make the PR for adding to our script the handling of it |
Ok, thanks - sounds like there is no bug here after all.
On Jun 29, 2020, at 3:42 PM, Marcel Schneider <[email protected]<mailto:[email protected]>> wrote:
@davidlange6<https://github.com/davidlange6> There is not really a bug, more of a feature.
DQM (since ~forever) has a mechanism called "Efficiency Flag", that is used to mark histograms that are not histograms so DQMGUI can handle them correctly.
This PR sets kIsAverage in ROOT if the efficiency flag is set so that ROOT also handles them correctly.
This now breaks a downstream script that relied on TH1::Add on a efficiency TH1 being handled like if it was a histogram.
According to @mtosi<https://github.com/mtosi> it was not that easy to just TH1::ResetBit(kIsAverage), which indicates a bug somewhere on ROOT side.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#28622 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABGPFQ52KGQFXQCDDC236Y3RZCK5HANCNFSM4J2N7I7A>.
|
PR description:
This PR depends on many previous PR and contains them. Integration will take a while. [Edit: All dependencies are in!. Should be good to go now.]
This PR completely replaces the Core DQM infrastructure. Some things don't change, but are re-implemented. This state was achieved in previous pull requests:
DQM Histograms are stored and managed by
edm::Service<DQMStore>
.DQMStore
for a snapshot of the current histograms.In non-legacy modules, all dependencies across the the
DQMStore
are passed to edm viaDQMToken
products, that do not contain any data. (As far as possible -- there are no job products, so the job-level dependencies cannot be expressed.)DQMGenerationReco
,DQMGenerationHarvesting
,DQMGenerationQTest
, denoted in the instance label.DQMGenerationReco
is produced byDQMEDAnalyzer
s, which consume only "normal" (non-DQM) products.DQMGenerationHarvesting
is produced byDQMEDHarvester
s, which consume (by default) allDQMGenerationReco
tokens. This allows all old code to work without explicitly declaring dependencies.DQMEDHarvester
s provide a mechanism to consume more specific tokens, includingDQMGenerationHarvesting
tokens from other harvesters.DQMGenerationQTest
is produced only by theQualityTester
module (of which we typically have many instances). TheQualityTester
has fully configurable dependencies to allow the more complicated setups sometimes required in harvesting, but typically consumesDQMGenerationHarvesting
.DQMGenerationQTest
products (the QTest results) and do not specify that, but things still work because they do their work inendJob
.There are six supported types of DQM modules:
DQMEDAnalyzer
, based onedm::one::EDProducer
. Used for the majority of histogram filling in RECO jobs. Soon to be based onedm::stream
.DQMOneEDAnalyzer
based onedm::one::EDProducer
. Used when begin/end job transitions are required. Can accept moreedm::one
specific options. Cannot save per-lumi histograms.DQMOneLumiEDAnalyzer
based onedm::one::EDProducer
. Used when begin/end lumi transitions are needed. Blocks concurrent lumisections.DQMGlobalEDAnalyzer
based onedm::global::EDProducer
. Used for DQM@HLT and a few random other things. Cannot save per-lumi histograms (this is a conflict with the fact that HLT typically saves only per lumi histograms, see Concurrent Lumis and DQM at HLT #28341).DQMEDHarvester
based onedm::one::EDProducer
. Used in harvesting jobs to manipulate histograms in lumi, run, and job transitions.edm::EDAnalyzer
legacy modules. Can do filling and harvesting. Not safe to use in the presence of concurrent lumisections. Safe for multi-threaded running from the DQM framework side.There are four supported file formats for DQM histograms:
TTree
based ROOT files. Reading and writing implemented in EDM input and output modules inDQMServices/FwkIO
:DQMRootSource
andDQMRootOutputModule
.TDirectory
basedDQM_*.root
ROOT files. Only write support, implemented inDQMServices/Core
(see DQM: new DQMFileSaver #28588). Read support dropped in this PR, unneeded since references where removed. Only this format supports saving JOB histograms.fastHadd
files. Implemented byDQMFileSaverPB
/DQMFileSaver
(see Replace mark-and-delete at next merge algorithm in DQMStore (75x) #11086) and the edm input moduleDQMProtobufReader
, in "streamer" format for HLT/DAQ.MEtoEDM
edm based files. Read and written by the Pool I/O modules, after copyingDQMStore
content to edm products usingMEtoEDMConverter
.In this PR, the following features are implemented/modified:
DQMStore
. While there where many different modes before (enableMultiThread
,lsBasedMode
,forceResetOnLumi
,collate
, etc.) that could be configured in various ways and changed the behavior of theDQMStore
(sometimes fundamentally, making certain features only work in certain modes), the new DQMStore has only a single mode.saveByLumi
in theDQMStore
) and harvesting (reScope
inDQMRootSource
). Both can be expressed in terms of Scope, see later.assertLegacySafe
, only adds assertions to make sure no operations that would be unsafe in the presence of legacy modules sneak in. It does not affect the behaviour.verbose
option should not affect the behavior, though it has in the past (if only due to race conditions).IBooker
and booking methods cannot run concurrently, but it is easy to change this now (will probably be allowed with theedm::stream
modules).MonitorElement
objects are held in the modules (so their life cycle has to match that of the module) but also represent histograms whose life cycle depends the data processed (run and lumi transitions). This caused conflicts since the introduction of multi-threading.DQMStore
resolves this conflict by representing each monitor element using (at least) two objects: A localMonitorElement
, that follows the module life cycle but does not own data, and a globalMonitorElement
that owns histogram data but does not belong to any module. There may be multiple local MEs for one global ME if multiple modules fill the same histogram (edm::stream
or even independent modules). There may be multiple global MEs for the same histogram if there are concurrent lumisections.enterLumi
,leaveLumi
). For legacyedm::EDAnalyzer
s, global begin/end run/lumi hooks are used, which only work as long as there are no concurrent lumisections. The local MEs are kept in a set of containers indexed by themoduleID
, with special value0
for all legacy modules and special values forDQMGlobalEDAnalyzer
s, where the local MEs need to match the life cycle of therunCache
(module id + run number).cleanupLumi
hook using the edm feature added in Added Service callbacks for Run and Lumi writing #28562, edm::Service callback for lumi/run cleanup #28521 . They are kept in a set of containers indexed by run and lumi number. ForRUN
MEs, the lumi number is 0; forJOB
MEs, run and lumi are zero. The special pair(0,0)
is also used for prototypes: Global MEs that are not currently associated to any run or lumi, but can (and have to, for the legacy guarantees) be recycled once a run or lumi starts.assertLegacySafe
(enabled by default) checks for this condition and crashes the job if it is violated.DQMStore
, the handling of per-job vs. per-run historgrams as well as the handling of per-lumi histograms in online, offline, and harvesting is very confusing.JOB
,RUN
, orLUMI
.RUN
.IBooker::setScope()
to change the scope to e.g.LUMI
. This replaces the oldsetLumiFlag
.saveByLumi
option in theDQMStore
, the default scope changes toLUMI
for all modules that can support per-lumi saving. It could still be manually overridden in code.JOB
. This works for single-run as well as multi-run harvesting, and emulates the old behavior. Moving to scopeRUN
for non-multi-run harvesting would be cleaner, but requires bigger changes to existing code.DQMRootSource
:reScope
. This option sets the finest allowed scope when reading histograms from the input files.reScope
is set toLUMI
. The scope of MEs is not changed, histograms are only merged if a run is split over multiple files.reScope
is set toRUN
. Now, MEs saved with scopeLUMI
will be switched to scopeRUN
and merged. The harvesting modules can observe increasing statistics in the histogram as a run is processed (like in online DQM).reScope
is set toJOB
. Now, evenRUN
histograms are merged. This is the default, since it also works for today's single-run harvesting jobs.DQMRootSource
.MonitorElement
s.reco
,harvesting
,legacy
) providing this ME type, however thereco
version does not expose some non-thread-safe APIs.MonitorElement
can own theMonitorElementData
that holds the actual histogram (that is the case for global MEs), or share it with one ME that owns the data and others that don't (this is the case for local MEs).MonitorElementData
does not provide an API. It is always wrapped in aMonitorElement
.MonitorElement
has no state, apart from the pointer to the data and theis_owned
flag.DQMNet::CoreObject
, which is present in theMonitorElement
to remain compatible withDQMNet
for online DQM. The values there (dir, name, flags, qtests) are to be considered cached copies of the "real" values inMonitorElementData
. The dir/name copy in theCoreObject
is also used as a set index for thestd::set
s in theDQMStore
, since it remains available even when theMonitorElementData
is detached from local MEs (e.g. between lumisections).MonitorElement
still allows access to the underlying ROOT object (getTH1()
), but this is unsafe and should be avoided whenever possible. However, there is no universal replacement yet, and DQM framework code uses direct access to the ROOT object in many places.process.DQMStore.trackME = cms.untracked.string("<ME Name>")
option in the config file.- The
DQMStore
will then log all life cycle events affecting MEs matching this name. This does not include things done to the ME (like filling -- theDQMStore
is not involved there), but it does include creation, reset, recycling, and saving of MEs.- The matching is a sub-string match on the full path, so it is also possible to watch folders or groups of MEs.
- For the more difficult cases, it can make sense to put a debug breakpoint (
std::raise(SIGNINT)
) insideDQMStore::debugTrackME
to inspect the stack when a certain ME is created/modified.- The previous functionality of logging the caller for all booking calls also still exists and can be enabled by setting
process.DQMStore.verbose = 4
.PR validation:
Passes the relevant unit tests and also some
runTheMatrix
workflows.Known broken:
DQMStore::load()
(should be unused, but there still is code referring to it, related to references. This can be removed, since references are gone for a while now.)DQMRootSource
related to single/multi run harvesting.Other flaws:
MonitorElement
implementation can be moved around/removed.