Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove noisy missing collection warnings coming from EcalDQMonitorTask #42848

Merged
merged 4 commits into from
Sep 27, 2023

Conversation

alejands
Copy link
Contributor

@alejands alejands commented Sep 23, 2023

PR description:

Addresses issue #42720 dealing with multiple missing collection warnings per event coming from the EcalDQMonitorTask module using the ecalMonitorTaskEcalOnly config in offline relval WFs. The collections fell into one of three groups with each group having a separate reason or set of reasons for the warnings.


The GpuRecHit collections do not exist (as far as I know this is still the case). RecHit monitoring was switched off for online GPU validation in #39393, but this was not propagated to offline since RecHits were enabled by default in the ECAL GPU validation module. With this PR, the RecHits will now be switched off by default. The assumed RecHit input tags are already in place for when the collections are implemented.

In addition, there was an oversight of mine in #39371 while implementing these collection-level switches to be used in exactly this kind of context. This is now corrected in this PR, and unused collections for GPU monitoring are no longer consumed.


The TrigPrimEmulDigi and ReducedRecHit collections are straightforward to explain. They are not being produced in offline DQM EcalOnly workflows, and there are already flags in place to switch off running over these collections. The emulated digis were already switched off in the general offline config, but the same was not done for the EcalOnly config, as the collection is switched off after cloning the original config. This PR switches these collections off for the offline EcalOnly config.


EcalRawData is the trickiest to switch off on the ECAL DQM side since it's used by several modules. This issue is now resolved in #42844.

However, we simultaneously implemented a bandaid solution on the ECAL DQM side. We added a parameter to EcalDQMonitorTask called skipCollections that takes in the names of collections to be explicitly removed from the list of collections being run by the ECAL DQM module.

The skipping is done as barebones as possible and we decided to purposefully not remove any dependencies that might cause a problem if a collection is not present. This way we can prevent potentially undefined behavior from slipping through the cracks and leaving little to no trace of something being wrong.

Skipping a collection will lead to the same behavior as having a missing collection if there is a further issue. The difference is that a skipped collection will not be consumed and the ECAL DQM module will not attempt to access the data. This could be useful in the future and not necessarily in the context of a missing collection.

The other PR is the ideal solution to the current issue so this isn't going to be used right now, but it doesn't hurt to have the extra customizability option in our toolkit.


PR validation:

Tested with both workflows brought up in issue #42720. The workflows ran successfully with no warnings.

runTheMatrix.py --what gpu -l 141.008583
runTheMatrix.py --what gpu -l 11634.511

Adding EcalRawData to the new skipCollections parameter works as intended. The warning and exception paths added also run as intended in their relevant scenarios.


If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

A backport can be made if needed.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-42848/36978

  • This PR adds an extra 32KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @alejands (Alejandro Sanchez) for master.

It involves the following packages:

  • DQM/EcalMonitorTasks (dqm)
  • DQMOffline/Ecal (dqm)

@nothingface0, @emanueleusai, @cmsbuild, @pmandrik, @syuvivida, @tjavaid, @micsucmed, @rvenditti can you please review it and eventually sign? Thanks.
@rchatter, @rociovilar, @wang0jin, @thomreis, @argiro this is something you requested to watch as well.
@rappoccio, @antoniovilela, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@mmusich
Copy link
Contributor

mmusich commented Sep 23, 2023

test parameters:

  • enable = gpu
  • workflows_gpu = 141.008583, 160.03502
  • addpkg = DQM/Integration

@mmusich
Copy link
Contributor

mmusich commented Sep 23, 2023

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-e6c161/34874/summary.html
COMMIT: 6ec2b12
CMSSW: CMSSW_13_3_X_2023-09-22-2300/el8_amd64_gcc11
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/42848/34874/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 4 lines to the logs
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3358044
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3358019
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 214 log files, 167 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

GPU Comparison Summary

Summary:

  • You potentially removed 13 lines from the logs
  • Reco comparison results: 31 differences found in the comparisons
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 44865
  • DQMHistoTests: Total failures: 705
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 44160
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -2470.007 KiB( 3 files compared)
  • DQMHistoSizes: changed ( 12434.586,... ): -415.428 KiB EcalEndcap/EETriggerTowerTask
  • DQMHistoSizes: changed ( 12434.586,... ): -183.389 KiB EcalBarrel/EBTriggerTowerTask
  • DQMHistoSizes: changed ( 12434.586,... ): -0.116 KiB EcalBarrel/EBRecoSummary
  • DQMHistoSizes: changed ( 12434.586,... ): -0.116 KiB EcalEndcap/EERecoSummary
  • DQMHistoSizes: changed ( 12434.587,... ): -168.215 KiB EcalBarrel/EBGpuTask
  • DQMHistoSizes: changed ( 12434.587,... ): -168.215 KiB EcalEndcap/EEGpuTask
  • Checked 16 log files, 18 edm output root files, 4 DQM output files
  • TriggerResults: no differences found

@alejands
Copy link
Contributor Author

The remaining EcalRawData warnings (shown below) for WF 141.008583 in step2_Run3-2023_JetMET2023B_GPUValidation.log should be resolved by #42844.

Begin processing the 1st record. Run 366727, Event 132255498, LumiSection 89 on stream 0 at 24-Sep-2023 15:27:02.391 CEST
%MSG-w EcalDQM:  EcalDQMonitorTask:ecalMonitorTaskEcalOnly  24-Sep-2023 15:27:04 CEST Run: 366727 Event: 132255498
EcalRawDataCollection does not exist. No event-type filtering will be applied
%MSG
 on gpu found: 3258 on cpu found: 105
%MSG-w EcalDQM:  EcalDQMonitorTask:ecalMonitorTaskEcalOnly  24-Sep-2023 15:27:04 CEST Run: 366727 Event: 132255498
Ecal Monitor Source::runOnCollection: EcalRawData does not exist
%MSG

@mmusich
Copy link
Contributor

mmusich commented Sep 25, 2023

@cmsbuild, please test with #42844

@thomreis
Copy link
Contributor

please test with #42844

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-e6c161/34894/summary.html
COMMIT: 6ec2b12
CMSSW: CMSSW_13_3_X_2023-09-25-1100/el8_amd64_gcc11
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/42848/34894/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 3 lines to the logs
  • Reco comparison results: 10 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3358044
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3358016
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 214 log files, 167 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

GPU Comparison Summary

Summary:

  • You potentially removed 142 lines from the logs
  • Reco comparison results: 45 differences found in the comparisons
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 44865
  • DQMHistoTests: Total failures: 4164
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 40701
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -2470.007 KiB( 3 files compared)
  • DQMHistoSizes: changed ( 12434.586,... ): -415.428 KiB EcalEndcap/EETriggerTowerTask
  • DQMHistoSizes: changed ( 12434.586,... ): -183.389 KiB EcalBarrel/EBTriggerTowerTask
  • DQMHistoSizes: changed ( 12434.586,... ): -0.116 KiB EcalBarrel/EBRecoSummary
  • DQMHistoSizes: changed ( 12434.586,... ): -0.116 KiB EcalEndcap/EERecoSummary
  • DQMHistoSizes: changed ( 12434.587,... ): -168.215 KiB EcalBarrel/EBGpuTask
  • DQMHistoSizes: changed ( 12434.587,... ): -168.215 KiB EcalEndcap/EEGpuTask
  • Checked 16 log files, 18 edm output root files, 4 DQM output files
  • TriggerResults: no differences found

@alejands
Copy link
Contributor Author

alejands commented Sep 26, 2023

The remaining warnings are gone now (log).

@tjavaid
Copy link

tjavaid commented Sep 27, 2023

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @antoniovilela, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2)

@rappoccio
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants