Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[13.2.X] Add pixelgpu online DQM client and augment DQM/Integration unit tests #42583

Merged

Conversation

mmusich
Copy link
Contributor

@mmusich mmusich commented Aug 16, 2023

backport of #42542

PR description:

This PR is a follow-up to my previous PR #41939 and to the TSG ticket https://its.cern.ch/jira/browse/CMSHLT-2846.
The main goal is to add DQM/Integration/python/clients/pixelgpu_dqm_sourceclient-live_cfg.py: a new DQM online client designed to monitor the GPU and CPU Pixel collections from within the streamDQMGPUvsCPU stream.
At the moment only the SiPixelRawDataError collections are monitored by SiPixelPhase1RawDataErrorComparator, but in future, once more event products could be persisted after the ongoing alpaka migration, also other clients that are currently run within the HLT menu itself could be moved here.
I profit of this PR to include the new GPUVsCPU client (and the two existing ECAL and HCAL ones) in the battery of unit tests.
For data a corresponding PR to cms-data has been created (cms-data/DQM-Integration#4) in order to supply the necessary input files (albeit - alas - these files lack the proper pixel collections (SiPixelRawDataErroredmDetSetVector_hltSiPixelDigisFromSoA_*_* and SiPixelRawDataErroredmDetSetVector_hltSiPixelDigisLegacy_*_*) that became available after CMSHLT-2846 was closed).

PR validation:

Run successfully:

scram b runtests_TestDQMOnlineClient-ecalgpu_dqm_sourceclient
scram b runtests_TestDQMOnlineClient-hcalgpu_dqm_sourceclient
scram b runtests_TestDQMOnlineClient-pixelgpu_dqm_sourceclient

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

Verbatim backport of #42542 for deployment in 2023 HI data-taking.

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 16, 2023

A new Pull Request was created by @mmusich (Marco Musich) for CMSSW_13_2_X.

It involves the following packages:

  • DQM/Integration (dqm)
  • DQM/SiPixelHeterogeneous (dqm)

@nothingface0, @emanueleusai, @cmsbuild, @pmandrik, @syuvivida, @tjavaid, @micsucmed, @rvenditti can you please review it and eventually sign? Thanks.
@batinkov, @rchatter, @wang0jin, @argiro, @fioriNTU, @thomreis, @jandrea, @idebruyn, @threus, @francescobrivio this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@mmusich mmusich changed the title [12.3.X] Add pixelgpu online DQM client and augment DQM/Integration unit tests [13.2.X] Add pixelgpu online DQM client and augment DQM/Integration unit tests Aug 16, 2023
@mmusich
Copy link
Contributor Author

mmusich commented Aug 16, 2023

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-476549/34316/summary.html
COMMIT: dc4e359
CMSSW: CMSSW_13_2_X_2023-08-16-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/42583/34316/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 3 lines from the logs
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3196338
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3196316
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 207 log files, 159 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@emanueleusai
Copy link
Member

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_13_2_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_13_3_X is complete. This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @antoniovilela, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

please test with cms-sw/cmsdist#8655

@perrotta
Copy link
Contributor

@mmusich this PR was succesfuly tested even without cms-data/DQM-Integration#4
Your new tests aren't run automatically as unit test, correct?

@perrotta
Copy link
Contributor

As suggested by the static analyzer, errorCodeToStringMap and errorCodeToTypeMap in DQM/SiPixelHeterogeneous/plugins/SiPixelPhase1RawDataErrorComparator.cc shoud be made const (not for this PR)

@mmusich
Copy link
Contributor Author

mmusich commented Aug 17, 2023

@perrotta

this PR was succesfuly tested even without cms-data/DQM-Integration#4

cms-data/DQM-Integration#4 is already merged, hence I would presume have presumed the files needed for the unit test are already available in the base IB for the tests?
EDIT: apparently it's not the case. Do we need a backport?

Your new tests aren't run automatically as unit test, correct?

wrong, the unit tests are automatically run, and indeed a quick look into the unit test folder shows that they are run:

@mmusich
Copy link
Contributor Author

mmusich commented Aug 17, 2023

As suggested by the static analyzer, errorCodeToStringMap and errorCodeToTypeMap in DQM/SiPixelHeterogeneous/plugins/SiPixelPhase1RawDataErrorComparator.cc shoud be made const (not for this PR)

OK, I'll fix this in a follow-up. Please recall that the potential thread safety issue is less problematic for a DQM online client (that AFAIK runs single-threaded)

@mmusich
Copy link
Contributor Author

mmusich commented Aug 17, 2023

EDIT: apparently it's not the case. Do we need a backport?

OK, apparently it was done already at cms-sw/cmsdist#8655

@perrotta
Copy link
Contributor

wrong, the unit tests are automatically run, and indeed a quick look into the unit test folder shows that they are run

So, they succeed even if the streamer file is not found: is that the intended behaviour?

@perrotta
Copy link
Contributor

OK, I'll fix this in a follow-up. Please recall that the potential thread safety issue is less problematic for a DQM online client (that AFAIK runs single-threaded)

I would say it is not problematic at all here, because they are treated as static const in the code nonetheless: it is just to shut up the static analyzer and get rid of a few easily avoidable false positive

@mmusich
Copy link
Contributor Author

mmusich commented Aug 17, 2023

@perrotta

So, they succeed even if the streamer file is not found: is that the intended behaviour?

it's not up to me to judge, but that's how the central core DQM code is setup:

for (const auto& runPath : runPath_) {
if (!std::filesystem::exists(runPath)) {
logFileAction("Directory does not exist: ", runPath);
continue;
}

Any amendment of this behavior does not pertain to this particular PR.

@mmusich
Copy link
Contributor Author

mmusich commented Aug 17, 2023

Any amendment of this behavior does not pertain to this particular PR.

since cmsRun per se will not shut down in an error state, what I can do is to check upfront in the unit test source code that the input streamers are available.

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-476549/34334/summary.html
COMMIT: dc4e359
CMSSW: CMSSW_13_2_X_2023-08-16-2300/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/42583/34334/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 11 lines to the logs
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3196338
  • DQMHistoTests: Total failures: 5
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3196311
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 207 log files, 159 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@perrotta
Copy link
Contributor

+1

  • Now the streamer file is seen by the unit tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants