Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add unit test for HLT online-DQM plugins #40334

Merged
merged 6 commits into from
Jan 16, 2023

Conversation

missirol
Copy link
Contributor

PR description:

This PR aims to add a unit test involving most of the DQM modules (and services) running in the HLT and providing inputs to the online DQM. The test includes both DQM and harvesting steps, and it only requires that both run without errors (the unit test does not check the outputs).

Small changes are also made in the cpp implementation of a few plugins:

  • the parameter FolderName of PSMonitor is renamed folderName as done for LumiMonitor in give OnlineLuminosityRecord info to HLT's LumiMonitor plugin #39859;
  • default values are added for the parameters of LumiMonitor that don't currently have them;
  • a parameter fillEveryLumiSection is added to ThroughputServiceClient in analogy with FastTimeServiceClient.

There are two changes that require feedback (I'm not sure they are correct):

  • I needed to add a call in ThroughputService to change the 'scope of the DQM outputs' to RUN, otherwise I would not see the HLT/Throughput folder in the harvesting output of the unit test; I copied this scope change from FastTimerService, and I figure that's what was needed. I'm not sure it is the correct change; clearly the ThroughputService outputs were already produced correctly in the online DQM (maybe in that case the default scope is somehow set differently compared to this unit test);
  • I removed the PSMonitorClient; it does not produce outputs, it can only issue a warning, and maybe this is not so useful, but also here feedback would be needed.

This PR requires #40325.

Merely technical. No changes expected.

PR validation:

Manual tests with new unit test.

If this PR is a backport, please specify the original PR and why you need to backport that PR. If this PR will be backported, please specify to which release cycle the backport is meant for:

N/A

@@ -258,5 +269,6 @@ def customizeHLTforCMSSW(process, menuType="GRun"):

process = customizeHLTfor38761(process)
process = customizeHLTfor40264(process)
process = customizeHLTfor40334(process)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This customisation is not necessary for the menus in CMSSW. It can become necessary for user menus extracted from ConfDB, if those somehow contain a PSMonitor.

@@ -85,6 +85,7 @@ void ThroughputService::preGlobalBeginRun(edm::GlobalContext const& gc) {

// define a callback that can book the histograms
auto bookTransactionCallback = [&, this](DQMStore::IBooker& booker, DQMStore::IGetter&) {
auto scope = dqm::reco::DQMStore::IBooker::UseRunScope(booker);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from

auto scope = dqm::reco::DQMStore::IBooker::UseRunScope(booker);

Without this call, the harvesting output of the unit test does not contain a Throughtput folder, and I'm not really sure why that is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short version: I think this change is okay. More info below.

I had a look at https://github.com/cms-sw/cmssw/blob/master/DQMServices/Core/README.md, but that didn't really clarify things for me.

If I understand the interface, the DQM scope in ThroughputService before the call to

auto scope = dqm::reco::DQMStore::IBooker::UseRunScope(booker);

corresponds to scope.oldscope. The latter returns 1, which corresponds to JOB (and after the call, the scope becomes RUN). This seems to match the default set here, i.e. JOB.

The call to use the RUN scope in FastTimerService was introduced in #28622 by DQM (maybe ThoughputService was simply overlooked in that PR). I don't see a reason why FastTimerService and ThroughputService should differ in this respect.

With this PR, I managed to produce a ROOT output file with the client hlt_dqm_clientPB-live_cfg.py reading .pb files produced by re-running a recent HLT menu on 2022 data; in that ROOT output file, I see the HLT/Throughput folder, and the plots look as expected. This suggests that this PR does not break the workflow to produce these plots online (which somehow was already working).

Based on the above, I would conclude that this change is okay, even though I don't fully understand it; in particular, I don't know why the HLT/Throughput plots were already being produced correctly in the online DQM without this PR.

@cms-sw/dqm-l2 , do you have insight on this?

@@ -95,7 +96,7 @@ void ThroughputService::preGlobalBeginRun(edm::GlobalContext const& gc) {
};

// book MonitorElement's for this run
edm::Service<DQMStore>()->meBookerGetter(bookTransactionCallback);
edm::Service<dqm::legacy::DQMStore>()->meBookerGetter(bookTransactionCallback);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this change is unimportant; again it follows what is done in the FastTimerService

edm::Service<dqm::legacy::DQMStore>()->meBookerGetter(bookTransactionCallback);

In ThroughputService, the following is used

typedef dqm::reco::DQMStore DQMStore;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what difference does it make to use dqm::legacy instead of dqm::reco ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

answering to myself: none, they are typedef one to the other.

Copy link
Contributor Author

@missirol missirol Dec 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info. Since there is no difference, I will remove the extra dqm::legacy:: from here.

Edit : done in fda3c14.

@missirol
Copy link
Contributor Author

test parameters:

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40334/33420

  • This PR adds an extra 32KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @missirol (Marino Missiroli) for master.

It involves the following packages:

  • DQM/HLTEvF (dqm, hlt)
  • DQM/Integration (dqm)
  • HLTrigger/Configuration (hlt)
  • HLTrigger/Timer (hlt)

@Martin-Grunewald, @emanueleusai, @ahmad3213, @cmsbuild, @missirol, @jfernan2, @syuvivida, @pmandrik, @micsucmed, @rvenditti can you please review it and eventually sign? Thanks.
@batinkov, @battibass, @silviodonato, @mtosi, @Martin-Grunewald, @fwyzard, @threus, @francescobrivio this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@missirol
Copy link
Contributor Author

@fwyzard , it would be very useful to have your feedback on this PR.

@missirol
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c89ae2/29647/summary.html
COMMIT: cb81125
CMSSW: CMSSW_13_0_X_2022-12-15-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/40334/29647/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 14 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3557521
  • DQMHistoTests: Total failures: 163
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3557336
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 211 log files, 162 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@missirol
Copy link
Contributor Author

test parameters:

@missirol
Copy link
Contributor Author

please test

Rebased on IB which includes #40325, and updated following the discussion in #40334 (comment).

@missirol
Copy link
Contributor Author

missirol commented Jan 2, 2023

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 3, 2023

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c89ae2/29783/summary.html
COMMIT: 444109c
CMSSW: CMSSW_13_0_X_2023-01-02-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/40334/29783/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 32 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3555748
  • DQMHistoTests: Total failures: 1214
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3554512
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 211 log files, 162 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@missirol
Copy link
Contributor Author

missirol commented Jan 9, 2023

+hlt

@missirol
Copy link
Contributor Author

@cms-sw/dqm-l2, could you please review this PR? Thanks!

@emanueleusai
Copy link
Member

Concerning the removal of the PSMonitorClient, I agree with you, It look like the process just printed a warning if plots are not present. That's odd.

Concerning the ThroughputService, your logic is correct, and I believe what is done in the FastTimerService is correct and can be safely copied to ThroughputService, but the fact that it worked already correctly online makes me suspicious. We could in principle test this online in playback to be extra safe, but we would need a backport...
What do you think?

@missirol
Copy link
Contributor Author

Concerning the ThroughputService, your logic is correct, and I believe what is done in the FastTimerService is correct and can be safely copied to ThroughputService, but the fact that it worked already correctly online makes me suspicious. We could in principle test this online in playback to be extra safe, but we would need a backport...
What do you think?

Opening a backport PR to do this test is probably easy to do (for which cycle ? 12_4_X ?); I assume the idea is just to open the backport PR to do the test, and then it would not be merged.

On the other hand, I don't think the playback test would help checking the change in ThroughputService. If I understand what that test does, it takes existing streamer files and runs the online-DQM clients, so it would not exercise the ThroughputService (which runs 'inside' the HLT, when the streamer files are created).

To try and convince myself that this change is okay, I produced streamer files by rerunning the HLT step with this PR, and then manually ran one of the HLT online-DQM clients on those streamer files [1]. In this 'online-like' test, one gets the expected histograms in the Throughput folder (both with and without the scope change in ThroughputService); in the case of the unit test (which uses DQMIO, not .pb files), the Throughput folder only appears using the 'RUN' scope in ThroughputService (i.e. the change in this PR). So, things work with this PR, but I'm not fully clear on what this 'dqm scope' does, and if it somehow acts differently for DQMIO output vs .pb output (I take the liberty to tag the author of #28622, @schneiml, just in case he can chime in).

If you think the playback test is useful anyway, I'll open the backport PR to the relevant release cycle.

[1] Tested in CMSSW_13_0_X_2023-01-12-2300.

#!/bin/bash

# scram project CMSSW CMSSW_13_0_X_2023-01-12-2300
# cd CMSSW_13_0_X_2023-01-12-2300/src
# eval `scram runtime -sh`
# git cms-merge-topic cms-sw:40334
# scram build

INPUTFILE=root://eoscms.cern.ch//eos/cms/store/group/dpg_trigger/comm_trigger/TriggerStudiesGroup/STORM/RAW/Run2022F_EphemeralHLTPhysics0_run361468/26ce1488-8c46-436b-becd-6b41535dda79.root

HLTMENU=/users/missirol/test/dev/CMSSW_13_0_0/tmp/test01/cmssw40334/HLT/V3

[ -d run361468 ] || (convertToRaw -f 100 -l 100 -r 361468:172 -o . -- "${INPUTFILE}")

if [ ! -f hlt.py ]; then
  tmpfile=$(mktemp)
  hltConfigFromDB --configName "${HLTMENU}" > "${tmpfile}"
  cat <<@EOF >> "${tmpfile}"

process.load('run361468_cff')

process.hltOnlineBeamSpotESProducer.timeThreshold = int(1e6)

from HLTrigger.Configuration.common import producers_by_type
for producer in producers_by_type(process, 'PSMonitor'):
  if hasattr(producer, 'FolderName'):
    if not hasattr(producer, 'folderName'):
      producer.folderName = producer.FolderName
    del producer.FolderName
@EOF

  edmConfigDump "${tmpfile}" > hlt.py
fi

cmsRun hlt.py &> hlt.log

cmsRun DQM/Integration/python/clients/hlt_dqm_clientPB-live_cfg.py \
  runInputDir=. runNumber=361468 runkey=pp_run \
  scanOnce=True datafnPosition=4

# output file: ./upload/DQM_V0001_HLTpb_R000361468.root

@emanueleusai
Copy link
Member

emanueleusai commented Jan 16, 2023

@missirol thank you very much for the detailed explanation. I now understand better your private test, and I agree this is sufficient for the PR to be approved.
I do not know well enough the underlying structure of the "scopes" to answer your question about the behavior of scopes between DQMIO output vs .pb. So any input from the original developers of the infrastructure is welcome, although I believe @schneiml is not with CMS anymore.
Generally speaking, the way I understand it, the "scope" separates MEs that are filled per-lumi, per-run, or per-job. So if you fill your MEs in the endRun the scope should be set to RUN and so forth.

@emanueleusai
Copy link
Member

+1

  • private test works as expected, online test not necessary
  • spurious differences in DQM comparisons
  • understanding of the underlying structure of scopes can continue separately from this PR imho

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit c628e55 into cms-sw:master Jan 16, 2023
@missirol missirol deleted the devel_testTriggerMonitors branch February 3, 2023 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants