Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix LumiCache in AlcaBeamMonitor and OnlineBeamMonitor #41700

Merged

Conversation

francescobrivio
Copy link
Contributor

@francescobrivio francescobrivio commented May 16, 2023

PR description:

Triggered by this CMSTalk post and as a follow-up of #37163 and #39973, @gennai and I implemented the changes suggested by @makortel in #37163 (comment).
The changes include:

  • In AlcaBeamMonitor:
    • redefine (and use) the LuminosityBlockCache
    • rename the histogram titles from "Scalers" to "Online"
    • Fixed the Event::getByToken() try/catch block as suggested by Matti
    • remove the unused numberOfProcessedLumis_ variable
    • moved the filling of processedLumis_ to globalEndLuminosityBlock() and removed the obsolete mutable declaration
  • In OnlineBeamMonitor:
    • redefine (and use) the LuminosityBlockCache
    • remove the unused numberOfProcessedLumis_ variable
    • moved the filling of processedLumis_ to globalEndLuminosityBlock() and removed the obsolete mutable declaration

@makortel @mmusich please let us know what you think of these changes.

PR validation:

We tested this with wf 138.5 which runs the full ALCA reco sequence in step3, using events from multiple lumisections.
I'm not sure if this is the correct way to do it or if some other trick is needed to properly test this, suggestions are welcome.

See below the discussion about testing this PR properly

Backport:

Not a backport. Backports available at:

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41700/35575

  • This PR adds an extra 16KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @francescobrivio for master.

It involves the following packages:

  • DQM/BeamMonitor (dqm, db)

@nothingface0, @emanueleusai, @tvami, @cmsbuild, @saumyaphor4252, @pmandrik, @syuvivida, @tjavaid, @micsucmed, @francescobrivio, @rvenditti can you please review it and eventually sign? Thanks.
@mmusich this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

Copy link
Contributor

@makortel makortel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gennai and @francescobrivio! This is the proper way to use LuminosityBlockCache in edm::one modules.

DQM/BeamMonitor/plugins/AlcaBeamMonitor.cc Outdated Show resolved Hide resolved
DQM/BeamMonitor/plugins/AlcaBeamMonitor.h Outdated Show resolved Hide resolved
@francescobrivio
Copy link
Contributor Author

francescobrivio commented May 16, 2023

In the last push I:

(I'll edit the PR description tomorrow)

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41700/35579

  • This PR adds an extra 24KB to repository

@cmsbuild
Copy link
Contributor

Pull request #41700 was updated. @nothingface0, @emanueleusai, @tvami, @cmsbuild, @saumyaphor4252, @pmandrik, @syuvivida, @tjavaid, @micsucmed, @francescobrivio, @rvenditti can you please check and sign again.

@francescobrivio
Copy link
Contributor Author

francescobrivio commented May 16, 2023

I also tested this with:

runTheMatrix.py -l 138.5 -j 8 --nStreams 2 --nEvents 1000 --numberEventsInLuminosityBlock 200

which gives, e.g.:
Screenshot 2023-05-16 alle 23 33 25

To be honest, given the tests parameters, I was expecting to see 5 lumisections (1000 evts / 200 evts per LS), but well...better than nothing!
I'm not sure how to instruct the bot to use the same options for testing...any suggestion?

EDIT:
Looking at the local logs I still see all events and LSs were processed by stream 0, so clearly the options I used didn't fully work...any suggestion also here?

@francescobrivio
Copy link
Contributor Author

test parameters:

  • enable = threading

@francescobrivio
Copy link
Contributor Author

@cmsbuild please test

  • while waiting for further feedback on the commits and on the correct way to test them, we can still see if the standard tests work in multithreading

@mmusich
Copy link
Contributor

mmusich commented May 17, 2023

also tested this with:
runTheMatrix.py -l 138.5 -j 8 --nStreams 2 --nEvents 1000 --numberEventsInLuminosityBlock 200

as far I can tell, when you run in multi-threaded mode (trading -nStreams 2 with -t 4) the plot you show here is changed with respect to the one obtained in a vanilla CMSSW_13_2_X_2023-05-16-2300.
Screenshot from 2023-05-17 12-51-31

On the other hand none of the two is consistent with the observations in the production enviroment (see CMSTalk post, in which apparently only one LS is plotted, https://tinyurl.com/2gtx6s9q). I gather we're not capturing with this command some feature that is running in the Tier-0 jobs.

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b8119c/32682/summary.html
COMMIT: 39a997a
CMSSW: CMSSW_13_2_X_2023-05-16-2300/el8_amd64_gcc11
Additional Tests: THREADING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/41700/32682/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 9 lines from the logs
  • Reco comparison results: 13 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3222071
  • DQMHistoTests: Total failures: 15
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3222034
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.7560000000000004 KiB( 47 files compared)
  • DQMHistoSizes: changed ( 1000.0,... ): -0.020 KiB AlcaBeamMonitor/Debug
  • DQMHistoSizes: changed ( 1000.0,... ): -0.009 KiB AlcaBeamMonitor/Validation
  • Checked 207 log files, 159 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@tvami
Copy link
Contributor

tvami commented May 17, 2023

test parameters:

  • workflow = 138.5
  • workflow_options = --nStreams 2 --nEvents 1000 --numberEventsInLuminosityBlock 200

@francescobrivio francescobrivio changed the title [RFC] Fix LumiCache in AlcaBeamMonitor Fix LumiCache in AlcaBeamMonitor May 18, 2023
@francescobrivio francescobrivio changed the title Fix LumiCache in AlcaBeamMonitor Fix LumiCache in AlcaBeamMonitor and OnlineBeamMonitor May 18, 2023
@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b8119c/32705/summary.html
COMMIT: d351d24
CMSSW: CMSSW_13_2_X_2023-05-17-2300/el8_amd64_gcc11
Additional Tests: THREADING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/41700/32705/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 1829 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 5 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3221927
  • DQMHistoTests: Total failures: 40809
  • DQMHistoTests: Total nulls: 19
  • DQMHistoTests: Total successes: 3181077
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.25499999999999956 KiB( 47 files compared)
  • DQMHistoSizes: changed ( 1000.0,... ): -0.020 KiB AlcaBeamMonitor/Debug
  • DQMHistoSizes: changed ( 1000.0,... ): -0.009 KiB AlcaBeamMonitor/Validation
  • DQMHistoSizes: changed ( 138.5 ): 0.891 KiB Hcal/DigiRunHarvesting
  • DQMHistoSizes: changed ( 138.5 ): 0.098 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 138.5 ): 0.023 KiB RPC/DCSInfo
  • Checked 207 log files, 159 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@tvami
Copy link
Contributor

tvami commented May 18, 2023

You potentially added 1829 lines to the logs

was this expected?

@mmusich
Copy link
Contributor

mmusich commented May 18, 2023

was this expected?

Given that in the backport tests there is no such increase (#41722 (comment)) I tend to conclude it's coming from the non standard test parameters used here.

@perrotta
Copy link
Contributor

The extra lines in the log are due to the fact the wf 138.5_ExpressCollisions2021 runs with 100 events in the baseline and 1000 events in these tests: is it because of the multithreading enabled?
(Not an issue for the content of this PR, in any case)

@francescobrivio
Copy link
Contributor Author

is it because of the multithreading enabled?

The multithreading and 100 events were part of the non-standard parameters set for the tests (my fault).

@tvami in case you prefer, we can restart the tests after resetting the parameters to the standard values before you sign the PR. Let me know what you prefer, or feel free to do it :)

@tvami
Copy link
Contributor

tvami commented May 18, 2023

we can restart the tests

I dont think we need that, we can just use the backport tests to convince ourselves

@tvami
Copy link
Contributor

tvami commented May 18, 2023

+db

  • PR according to desc
  • tests pass, diffs are understood

@emanueleusai
Copy link
Member

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants