Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the Ecal Endcaps to the ML based online ECAL DQM #41175

Merged
merged 1 commit into from
Apr 18, 2023

Conversation

abhih1
Copy link
Contributor

@abhih1 abhih1 commented Mar 24, 2023

PR description:

This PR introduces the Ecal Endcaps into the autoencoder-based online ECAL DQM feature, which was implemented for EB in #35990.
Separate Autoencoder (AE) models with ResNet architecture are trained for EE+ and EE-, apart from the model for EB, on certified good data (digi occupancy) from 2018 runs.

On giving an input occupancy map to the AE, the encoder part of the AE encodes and learns the features and the decoder reconstructs the data from the encoded latent space to match the input as closely as possible. The reconstruction loss is then calculated, which is a mean squared error (MSE) between the input and output images at a tower level. Thus given an anomalous tower, the AE which has learnt the features of the good data will have a hard time reconstructing it and give a higher loss on the anomaly than on the good towers. A quality threshold is then applied on this loss map which marks it as Good or Bad, which is then stored as an ML quality summary plot.
New correction factors are derived from 2022 collisions data to use in the pre-processing and inference, which follows the same steps as used for EB.

This PR thus introduces ML Quality summary plots for EE- and EE+, along with Loss Map and reconstructed occupancy maps from the AE.
It also introduces a trend plot to monitor the no. of bad towers flagged by the AE per lumisection in a run, as well as the map of these bad towers in an occupancy-like plot. This would be very helpful in monitoring per lumisection behaviour of bad towers/channels.

Please note that this PR should be tested along with the files added to cms-data/DQM-EcalMonitorClient#3

PR validation:

The code was validated by running the online Ecal DQM configuration and the resultant plots were examined by uploading the output file to a DQM test gui.
The new plots are confirmed and look reasonable.

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

This is the master PR.
Backport to 13_0_X used in production is here: #41195

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41175/34851

  • This PR adds an extra 28KB to repository

  • There are other open Pull requests which might conflict with changes you have proposed:

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @abhih1 (Abhirami Harilal) for master.

It involves the following packages:

  • DQM/EcalMonitorClient (dqm)
  • DQM/EcalMonitorTasks (dqm)

@emanueleusai, @cmsbuild, @syuvivida, @rvenditti, @micsucmed, @pmandrik can you please review it and eventually sign? Thanks.
@rchatter, @simonepigazzini, @thomreis, @argiro this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@emanueleusai
Copy link
Member

type ecal

@emanueleusai
Copy link
Member

please test

@emanueleusai
Copy link
Member

please test with cms-data/DQM-EcalMonitorClient#3

@cmsbuild
Copy link
Contributor

-1

Failed Tests: RelVals RelVals-INPUT AddOn
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-6334a6/31609/summary.html
COMMIT: 98b18c4
CMSSW: CMSSW_13_1_X_2023-03-26-2300/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/41175/31609/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals

RelVals-INPUT

AddOn Tests

----- Begin Fatal Exception 27-Mar-2023 11:08:38 CEST-----------------------
An exception of category 'ConfigFileReadError' occurred while
   [0] Processing the python configuration file named /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-26-2300/src/Utilities/ReleaseScripts/scripts/read312RV_cfg.py
Exception Message:
 unknown python problem occurred.
ModuleNotFoundError: No module named 'past'

At:
  /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-26-2300/src/FWCore/ParameterSet/python/Types.py(6): <module>
  <frozen importlib._bootstrap>(228): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(850): exec_module
  <frozen importlib._bootstrap>(695): _load_unlocked
  <frozen importlib._bootstrap>(986): _find_and_load_unlocked
  <frozen importlib._bootstrap>(1007): _find_and_load
  /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-26-2300/src/FWCore/ParameterSet/python/Config.py(15): <module>
  <frozen importlib._bootstrap>(228): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(850): exec_module
  <frozen importlib._bootstrap>(695): _load_unlocked
  <frozen importlib._bootstrap>(986): _find_and_load_unlocked
  <frozen importlib._bootstrap>(1007): _find_and_load
  /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-26-2300/src/Utilities/ReleaseScripts/scripts/read312RV_cfg.py(2): <module>

----- End Fatal Exception -------------------------------------------------
[fastsim:1] cmsDriver.py TTbar_8TeV_TuneCUETP8M1_cfi  --conditions auto:run1_mc --fast  -n 100 --eventcontent AODSIM,DQM --relval 100000,1000 -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,VALIDATION  --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --datatier GEN-SIM-DIGI-RECO,DQMIO --beamspot Realistic8TeVCollision : FAILED - elapsed time: 0 sec (ended on Mon Mar 27 11:08:39 2023) - exit: 256
[fastsim1:1] cmsDriver.py TTbar_13TeV_TuneCUETP8M1_cfi --conditions auto:run2_mc_l1stage1 --fast  -n 100 --eventcontent AODSIM,DQM --relval 100000,1000 -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,VALIDATION  --customise=HLTrigger/Configuration/CustomConfigs.L1THLT --datatier GEN-SIM-DIGI-RECO,DQMIO --beamspot NominalCollision2015 --era Run2_25ns : FAILED - elapsed time: 1 sec (ended on Mon Mar 27 11:08:42 2023) - exit: 256
Expand to see more addon errors ...

@abhih1
Copy link
Contributor Author

abhih1 commented Mar 27, 2023

The errors seem to be unrelated to the changes in the PR.

@perrotta
Copy link
Contributor

please test with cms-data/DQM-EcalMonitorClient#3

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-6334a6/31614/summary.html
COMMIT: 98b18c4
CMSSW: CMSSW_13_1_X_2023-03-27-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/41175/31614/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-6334a6/31614/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-6334a6/31614/git-merge-result

Comparison Summary

Summary:

  • You potentially removed 6 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 143 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3554236
  • DQMHistoTests: Total failures: 11
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3554203
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 1125.06 KiB( 48 files compared)
  • DQMHistoSizes: changed ( 20834.76,... ): 20.921 KiB EcalBarrel/EBOccupancyTask
  • DQMHistoSizes: changed ( 20834.76,... ): 7.799 KiB EcalEndcap/EEOccupancyTask
  • DQMHistoSizes: changed ( 11834.0,... ): 3.246 KiB Ecal/Trends
  • DQMHistoSizes: changed ( 11834.0,... ): -0.360 KiB Physics/NanoAODDQM
  • DQMHistoSizes: changed ( 13234.0,... ): -0.234 KiB Physics/NanoAODDQM
  • Checked 213 log files, 164 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41175/35201

  • This PR adds an extra 16KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41175/35203

  • This PR adds an extra 16KB to repository

@cmsbuild
Copy link
Contributor

Pull request #41175 was updated. @emanueleusai, @cmsbuild, @syuvivida, @rvenditti, @micsucmed, @pmandrik can you please check and sign again.

@perrotta
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-6334a6/32005/summary.html
COMMIT: f6a03f3
CMSSW: CMSSW_13_1_X_2023-04-17-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/41175/32005/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 22 lines from the logs
  • Reco comparison results: 10 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3459609
  • DQMHistoTests: Total failures: 12
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3459575
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 1099.334 KiB( 47 files compared)
  • DQMHistoSizes: changed ( 1000.0,... ): 20.921 KiB EcalBarrel/EBOccupancyTask
  • DQMHistoSizes: changed ( 1000.0,... ): 7.799 KiB EcalEndcap/EEOccupancyTask
  • DQMHistoSizes: changed ( 1000.0,... ): 3.246 KiB Ecal/Trends
  • Checked 207 log files, 159 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@emanueleusai
Copy link
Member

+1

  • resign

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants