Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DQM: Fix empty lumis with per-lumi plots in DQMOneEDAnalyzer #29738

Merged
merged 4 commits into from
May 5, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DQMServices/Core/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ When a ME is booked, internally _global_ and _local_ MEs are created. This shou
- In the DQM API, we face the conflict that `MonitorElement` objects are held in the modules (so their life cycle has to match that of the module) but also represent histograms whose life cycle depends the data processed (run and lumi transitions). This caused conflicts since the introduction of multi-threading.
- The `DQMStore` resolves this conflict by representing each monitor element using (at least) two objects: A _local_ `MonitorElement`, that follows the module life cycle but does not own data, and a _global_ `MonitorElement` that owns histogram data but does not belong to any module. There may be multiple _local_ MEs for one _global_ ME if multiple modules fill the same histogram (`edm::stream` or even independent modules). There may be multiple _global_ MEs for the same histogram if there are concurrent lumisections.
- The live cycle of _local_ MEs is driven by callbacks from each of the module base classes (`enterLumi`, `leaveLumi`). For legacy `edm::EDAnalyzer`s, global begin/end run/lumi hooks are used, which only work as long as there are no concurrent lumisections. The _local_ MEs are kept in a set of containers indexed by the `moduleID`, with special value `0` for _all_ legacy modules and special values for `DQMGlobalEDAnalyzer`s, where the local MEs need to match the life cycle of the `runCache` (module id + run number), and `DQMEDAnalyzer`s, where the `streamID` is combined with the `moduleID` to get a unique identifier for each stream instance.
- The live cycle of _global_ MEs is driven by the enter/leave lumi calls (indirectly) and the `cleanupLumi` hook called via the edm service interface. They are kept in a set of containers indexed by run and lumi number. For `RUN` MEs, the lumi number is 0; for `JOB` MEs, run and lumi are zero. The special pair `(0,0)` is also used for _prototypes_: Global MEs that are not currently associated to any run or lumi, but can (and _have to_, for the legacy guarantees) be recycled once a run or lumi starts.
- The live cycle of _global_ MEs is driven by the `initLumi/cleanupLumi` hooks called via the edm service interface. They are kept in a set of containers indexed by run and lumi number. For `RUN` MEs, the lumi number is 0; for `JOB` MEs, run and lumi are zero. The special pair `(0,0)` is also used for _prototypes_: Global MEs that are not currently associated to any run or lumi, but can (and _have to_, for the legacy guarantees) be recycled once a run or lumi starts.
- If there are no concurrent lumisections, both _local_ and _global_ MEs live for the entire job and are always connected in the same way, which means all legacy interactions continue to work. `assertLegacySafe` (enabled by default) checks for this condition and crashes the job if it is violated.


Expand Down
6 changes: 6 additions & 0 deletions DQMServices/Core/interface/DQMEDAnalyzer.h
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,11 @@ class DQMEDAnalyzer : public edm::stream::EDProducer<edm::GlobalCache<DQMEDAnaly
}

void beginRun(edm::Run const& run, edm::EventSetup const& setup) final {
// if we run booking multiple times because there are multiple runs in a
// job, this is needed to make sure all existing MEs are in a valid state
// before the booking code runs.
edm::Service<DQMStore>()->initLumi(run.run(), /* lumi */ 0, this->moduleDescription().id());
edm::Service<DQMStore>()->enterLumi(run.run(), /* lumi */ 0, this->moduleDescription().id());
dqmBeginRun(run, setup);
edm::Service<DQMStore>()->bookTransaction(
[this, &run, &setup](DQMStore::IBooker& booker) {
Expand All @@ -83,6 +88,7 @@ class DQMEDAnalyzer : public edm::stream::EDProducer<edm::GlobalCache<DQMEDAnaly
},
meId(),
this->getCanSaveByLumi());
edm::Service<DQMStore>()->initLumi(run.run(), /* lumi */ 0, meId());
edm::Service<DQMStore>()->enterLumi(run.run(), /* lumi */ 0, meId());
}

Expand Down
1 change: 1 addition & 0 deletions DQMServices/Core/interface/DQMGlobalEDAnalyzer.h
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ class DQMGlobalEDAnalyzer
// local MEs for each run cache.
meId(run),
/* canSaveByLumi */ false);
dqmstore_->initLumi(run.run(), /* lumi */ 0, meId(run));
dqmstore_->enterLumi(run.run(), /* lumi */ 0, meId(run));
return h;
}
Expand Down
5 changes: 5 additions & 0 deletions DQMServices/Core/interface/DQMOneEDAnalyzer.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@ class DQMOneEDAnalyzer
}

void beginRun(edm::Run const& run, edm::EventSetup const& setup) final {
// if we run booking multiple times because there are multiple runs in a
// job, this is needed to make sure all existing MEs are in a valid state
// before the booking code runs.
edm::Service<DQMStore>()->initLumi(run.run(), /* lumi */ 0, this->moduleDescription().id());
edm::Service<DQMStore>()->enterLumi(run.run(), /* lumi */ 0, this->moduleDescription().id());
dqmBeginRun(run, setup);
edm::Service<DQMStore>()->bookTransaction(
Expand All @@ -41,6 +45,7 @@ class DQMOneEDAnalyzer
},
this->moduleDescription().id(),
this->getCanSaveByLumi());
edm::Service<DQMStore>()->initLumi(run.run(), /* lumi */ 0, this->moduleDescription().id());
edm::Service<DQMStore>()->enterLumi(run.run(), /* lumi */ 0, this->moduleDescription().id());
}

Expand Down
11 changes: 9 additions & 2 deletions DQMServices/Core/interface/DQMStore.h
Original file line number Diff line number Diff line change
Expand Up @@ -642,9 +642,16 @@ namespace dqm {
// For input modules: trigger recycling without local ME/enterLumi/moduleID.
MonitorElement* findOrRecycle(MonitorElementData::Key const&);

// this creates local all needed global MEs for the given run/lumi (and
// module), potentially cloning them if there are concurrent runs/lumis.
// Symmetrical to cleanupLumi, this is called from a framwork hook, to
// make sure it also runs when the module does not call anything.
void initLumi(edm::RunNumber_t run, edm::LuminosityBlockNumber_t lumi);
void initLumi(edm::RunNumber_t run, edm::LuminosityBlockNumber_t lumi, uint64_t moduleID);

// modules are expected to call these callbacks when they change run/lumi.
// The DQMStore then updates the module's MEs, potentially cloning them
// if there are concurrent runs/lumis.
// The DQMStore then updates the module's MEs local MEs to point to the
// new run/lumi.
void enterLumi(edm::RunNumber_t run, edm::LuminosityBlockNumber_t lumi, uint64_t moduleID);
void leaveLumi(edm::RunNumber_t run, edm::LuminosityBlockNumber_t lumi, uint64_t moduleID);

Expand Down
65 changes: 58 additions & 7 deletions DQMServices/Core/src/DQMStore.cc
Original file line number Diff line number Diff line change
Expand Up @@ -357,15 +357,23 @@ namespace dqm::implementation {
return nullptr;
}

void DQMStore::enterLumi(edm::RunNumber_t run, edm::LuminosityBlockNumber_t lumi, uint64_t moduleID) {
// Make sure global MEs for the run/lumi exist (depending on scope), and
// point the local MEs for this module to these global MEs.
void DQMStore::initLumi(edm::RunNumber_t run, edm::LuminosityBlockNumber_t lumi) {
// Call initLumi for all modules, as a global operation.
auto lock = std::scoped_lock(this->booking_mutex_);
for (auto& kv : this->localMEs_) {
initLumi(run, lumi, kv.first);
}
}

void DQMStore::initLumi(edm::RunNumber_t run, edm::LuminosityBlockNumber_t lumi, uint64_t moduleID) {
// Make sure global MEs for the run/lumi exist (depending on scope)

auto lock = std::scoped_lock(this->booking_mutex_);

// these are the MEs we need to update.
auto& localset = this->localMEs_[moduleID];
// this is where they need to point to.
// This could be a per-run or per-lumi set (depending on lumi == 0)
auto& targetset = this->globalMEs_[edm::LuminosityBlockID(run, lumi)];
// this is where we can get MEs to reuse.
auto& prototypes = this->globalMEs_[edm::LuminosityBlockID()];
Expand All @@ -386,7 +394,7 @@ namespace dqm::implementation {
auto target = targetset.find(me); // lookup by path, thanks to MEComparison
if (target != targetset.end()) {
// we already have a ME, just use it!
debugTrackME("enterLumi (existing)", nullptr, *target);
debugTrackME("initLumi (existing)", nullptr, *target);
} else {
// look for a prototype to reuse.
auto proto = prototypes.find(me);
Expand All @@ -410,7 +418,7 @@ namespace dqm::implementation {
auto result = targetset.insert(oldme);
assert(result.second); // was new insertion
target = result.first; // iterator to new ME
debugTrackME("enterLumi (reused)", nullptr, *target);
debugTrackME("initLumi (reused)", nullptr, *target);
} else {
// no prototype available. That means we have concurrent Lumis/Runs,
// and need to make a clone now.
Expand All @@ -431,9 +439,48 @@ namespace dqm::implementation {
auto result = targetset.insert(newme);
assert(result.second); // was new insertion
target = result.first; // iterator to new ME
debugTrackME("enterLumi (allocated)", nullptr, *target);
debugTrackME("initLumi (allocated)", nullptr, *target);
}
}
}
}

void DQMStore::enterLumi(edm::RunNumber_t run, edm::LuminosityBlockNumber_t lumi, uint64_t moduleID) {
// point the local MEs for this module to these global MEs.

// This needs to happen before we can use the global MEs for this run/lumi here.
// We could do it lazyly here, or eagerly globally in global begin lumi.
//initLumi(run, lumi, moduleID);

auto lock = std::scoped_lock(this->booking_mutex_);

// these are the MEs we need to update.
auto& localset = this->localMEs_[moduleID];
// this is where they need to point to.
auto& targetset = this->globalMEs_[edm::LuminosityBlockID(run, lumi)];

// only for a sanity check
auto checkScope = [run, lumi](MonitorElementData::Scope scope) {
if (scope == MonitorElementData::Scope::JOB) {
return (run == 0 && lumi == 0);
} else if (scope == MonitorElementData::Scope::RUN) {
return (run != 0 && lumi == 0);
} else if (scope == MonitorElementData::Scope::LUMI) {
return (lumi != 0);
}
assert(!"Impossible Scope.");
return false;
};

for (MonitorElement* me : localset) {
auto target = targetset.find(me); // lookup by path, thanks to MEComparison
if (target == targetset.end()) {
auto anyme = this->findME(me);
debugTrackME("enterLumi (nothingtodo)", me, nullptr);
assert(anyme && checkScope(anyme->getScope()) == false);
continue;
}
assert(target != targetset.end()); // initLumi should have taken care of this.
// now we have the proper global ME in the right place, point the local there.
// This is only safe if the name is exactly the same -- else it might corrupt
// the tree structure of the set!
Expand Down Expand Up @@ -675,15 +722,19 @@ namespace dqm::implementation {
// Set lumi and run for legacy booking.
// This is no more than a guess with concurrent runs/lumis, but should be
// correct for purely sequential legacy stuff.
// These transitions should only affect non-DQM*EDAnalyzer based code.
// Also reset Scope, such that legacy modules can expect it to be JOB.
// initLumi and leaveLumi are needed for all module types: these handle
// creating and deleting global MEs as needed, which has to happen even if
// a module does not see lumi transitions.
ar.watchPreGlobalBeginRun([this](edm::GlobalContext const& gc) {
this->setRunLumi(gc.luminosityBlockID());
this->initLumi(gc.luminosityBlockID().run(), /* lumi */ 0);
this->enterLumi(gc.luminosityBlockID().run(), /* lumi */ 0, /* moduleID */ 0);
this->setScope(MonitorElementData::Scope::JOB);
});
ar.watchPreGlobalBeginLumi([this](edm::GlobalContext const& gc) {
this->setRunLumi(gc.luminosityBlockID());
this->initLumi(gc.luminosityBlockID().run(), gc.luminosityBlockID().luminosityBlock());
this->enterLumi(gc.luminosityBlockID().run(), gc.luminosityBlockID().luminosityBlock(), /* moduleID */ 0);
});
ar.watchPostGlobalEndRun([this](edm::GlobalContext const& gc) {
Expand Down
4 changes: 3 additions & 1 deletion DQMServices/Demo/test/run_analyzers_cfg.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
parser.register('firstRun', 1, one, int, "See EmptySource.")
parser.register('numberEventsInRun', 100, one, int, "See EmptySource.")
parser.register('numberEventsInLuminosityBlock', 20, one, int, "See EmptySource.")
parser.register('processingMode', 'RunsLumisAndEvents', one, string, "See EmptySource.")
parser.register('nEvents', 100, one, int, "Total number of events.")
parser.register('nThreads', 1, one, int, "Number of threads and streams.")
parser.register('nConcurrent', 1, one, int, "Number of concurrent runs/lumis.")
Expand All @@ -50,7 +51,8 @@
numberEventsInLuminosityBlock = cms.untracked.uint32(args.numberEventsInLuminosityBlock),
firstLuminosityBlock = cms.untracked.uint32(args.firstLuminosityBlock),
firstEvent = cms.untracked.uint32(args.firstEvent),
firstRun = cms.untracked.uint32(args.firstRun))
firstRun = cms.untracked.uint32(args.firstRun),
processingMode = cms.untracked.string(args.processingMode))

process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(args.nEvents) )

Expand Down
11 changes: 11 additions & 0 deletions DQMServices/Demo/test/runtests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -155,4 +155,15 @@ cmsRun $LOCAL_TEST_DIR/run_analyzers_cfg.py outfile=empty.root nEvents=0
cmsRun $LOCAL_TEST_DIR/run_analyzers_cfg.py outfile=empty.root howmany=0
cmsRun $LOCAL_TEST_DIR/run_analyzers_cfg.py outfile=empty.root howmany=0 legacyoutput=True
cmsRun $LOCAL_TEST_DIR/run_analyzers_cfg.py outfile=empty.root howmany=0 protobufoutput=True
# also try empty lumisections. EmptySource does not really support 'no events' mode (never terminates), so, a bit of a hack here.
cmsRun $LOCAL_TEST_DIR/run_analyzers_cfg.py outfile=noevents.root processingMode='RunsAndLumis' &
PID=$!
sleep 5
kill -INT $PID
wait
[ 66 = $(dqmiolistmes.py noevents.root -r 1 | wc -l) ]
[ 66 = $(dqmiolistmes.py noevents.root -r 1 -l 1 | wc -l) ]
[ 66 = $(dqmiolistmes.py noevents.root -r 2 | wc -l) ]
[ 66 = $(dqmiolistmes.py noevents.root -r 2 -l 2 | wc -l) ]