Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic Reduction Network for photon energy regression in miniAOD using the SonicTriton service #37134

Merged
merged 42 commits into from
Apr 29, 2022

Conversation

ssrothman
Copy link
Contributor

@ssrothman ssrothman commented Mar 3, 2022

PR description:

This PR builds on our previous PR #35839 and contains the code to apply our Dynamic Reduction Network (DRN) energy regression for electron and photon objects. The target for this round of code is to apply our new energy corrections to photons in the AOD -> miniAOD step run on Tier 2. This PR also contains the identical code to do the same for electrons, but the electron regression has not been enabled pending the addition of the necessary ECAL/tracker energy combination. This will the the subject of a separate future PR.

The regression is computed with the SonicTriton service, requiring the top-level producer to use the ExternalWork interface and inherit from TritonEDProducer rather than EDProducer. We therefore decided to split the computation and application of the regression in order to avoid having to make modifications to the existing photon/electron producers. This PR consists of two main c++ objects:

  • DRNCorrectionProducertT<> is a template class that can produce the DRN energy corrections for reco:: or pat:: electrons or photons. This producers a ValueMap of std::pair<float, float> with the per-object predictions of the energy and the energy resolution.
  • EGRegressionModifierDRN extends ModifyObjectValueBase and consumes the ValueMap produced by the DRNCorrectionProducer to apply the regression. This modifier should work as a drop-in replacement for the existing EGRegressionModifierVX objects.

The regression can be enabled in a miniAOD workflow with the modifier in PhysicsTools/PatAlgos/python/slimming/enableDRN.py

The regression config file and weights are intended to be hosted on git cms-data, and will be the subject of an upcoming PR there. Until that time, they may be found at https://github.com/ssrothman/RecoEgamma.EgammaPhotonProducers.TritonModels

A summary of our new energy regression and the novel ML architecture underlying it can be found here.

PR validation:

Validation of this PR has been performed by enabling the DRN with the included modifier on top of a standard 2018UL AOD->miniAOD config, as can be found in PhysicsTools/PatAlgos/test/DRNtest_Hgg.py. We have performed testing on two samples: a ggH->gamma gamma sample (enabled in the included test config) and a ttH sample (/TTJets_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18RECO-106X_upgrade2018_realistic_v11_L1v1-v1/AODSIM). On both samples we have run our regression on 10,000 events and examined the computational and physics performance of our PR.

Physics Performance:

We have validated that our producer yields identical output to the standalone python code used for training and validation of our model (e.g. as seen in the slides linked above). In addition, we have checked this performance on the two samples mentioned above. Histograms of the predicted energy divided by the gen energy for photons in these samples for both our regression and the existing 2018UL BDT regression are here:

Hgg
ttH

Computational Performance:

On both samples we have profiled the computational performance of our model by enabling the TimerService and by monitoring resource usage in top. The overall performance picture is identical in both samples. These tests were performed on interactive nodes at LPC and therefore there was substantial variance in the absolute speed of the test workflow, but in all cases we observed that the DRN regression accounted for about 8% of the total event time in this AOD->miniAOD workflow. For a typical run, this is something like 40ms out of a total event time of 600ms. This is within expected parameters, and it is our understanding that ParticleNet (a similar architecture) takes a similar amount of time to run. While our test workflow was running, we observed a typical CPU utilization in top of ~100% by the cmsRun process and an additional utilization of ~20% by the SonicTriton process.

We have also included a unit test in PhysicsTools/PatAlgos/test

if this PR is a backport please specify the original PR and why you need to backport that PR:

NA

Before submitting your pull requests, make sure you followed this checklist:

@Sam-Harper @lgray @kpedro88 @rchatter @lfinco @swagata87 @jainshilpi @SohamBhattacharya @simonepigazzini @violatingcp

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 3, 2022

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37134/28667

  • This PR adds an extra 28KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@kpedro88
Copy link
Contributor

kpedro88 commented Mar 3, 2022

@ssrothman while you're code-formatting, typo: DRNCorrectionProducertT.cc -> DRNCorrectionProducerT.cc

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 3, 2022

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37134/28671

  • This PR adds an extra 28KB to repository

  • Found files with invalid states:

    • RecoEgamma/EgammaTools/plugins/DRNCorrectionProducertT.cc:

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 3, 2022

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37134/28672

  • This PR adds an extra 28KB to repository

  • Found files with invalid states:

    • RecoEgamma/EgammaTools/plugins/DRNCorrectionProducertT.cc:

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 3, 2022

A new Pull Request was created by @ssrothman for master.

It involves the following packages:

  • PhysicsTools/PatAlgos (reconstruction)
  • RecoEgamma/EgammaTools (reconstruction)

@jpata, @cmsbuild, @clacaputo, @slava77 can you please review it and eventually sign? Thanks.
@rappoccio, @gouskos, @jainshilpi, @schoef, @emilbols, @mbluj, @demuller, @varuns23, @seemasharmafnal, @mmarionncern, @ahinzmann, @lgray, @jdolen, @azotz, @Sam-Harper, @jdamgov, @nhanvtran, @gkasieczka, @hatakeyamak, @andrzejnovak, @AlexDeMoor, @valsdav, @JyothsnaKomaragiri, @sobhatta, @afiqaize, @gpetruc, @wrtabb, @mariadalfonso, @ram1123 this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@clacaputo
Copy link
Contributor

Hi @ssrothman , do I understand correctly that the DRN is turned on for photons and that there isn't a PR in cmsdata for the models? How can we test it on Jenkins?

@swagata87
Copy link
Contributor

We have reviewed this PR mostly from physics point of view, and as long as it is not enabled as default right now, it is okay for us. We understand that the plan is to integrate the PR to facilitate further checks/validation, based on which we can decide when to enable it as default egamma regression. We would still like to stress a few points, most of them need to be addressed before DRN can be default:

  1. Slide 29 of the talk linked in PR description, EE-EE category shows degradation. This needs to be fixed. Otherwise the physics case is not strong enough to justify BDT->DRN change and a CMS-wide usage of this, across all PAGs.

Screen Shot 2022-03-04 at 12 14 14

  1. Show performance of DRN on Run3 samples also, as this is intended for Run3.

  2. Dedicated regression for high-pT, saturated egamma is needed, similar to what we have now in BDT-based regression. Show the performance in high pT.

  3. It is preferable if the corrections go to GT instead of cms-data, because in case of a rereco one only need to change the GT in cmsDriver as opposed to any code changes.

  4. The last point here: https://indico.cern.ch/event/1086689/?note=176398, discussed in egamma meeting, needs to be addressed.

Thanks,
Linda and Swagata.

@ssrothman
Copy link
Contributor Author

Hi @ssrothman , do I understand correctly that the DRN is turned on for photons and that there isn't a PR in cmsdata for the models? How can we test it on Jenkins?

Indeed this is the case at the moment. I am putting together the cms-data PR this afternoon. In the meantime one can manually clone the repo from my personal git with the necessary configs.

@AdrianoDee
Copy link
Contributor

AdrianoDee commented Apr 21, 2022

+upgrade

  • wf 10804.31 and 10805.31 added;
  • new matrix suffix .31 for photonDRN

@jpata
Copy link
Contributor

jpata commented Apr 26, 2022

type egamma

@perrotta
Copy link
Contributor

please test
(to refresh some 13 days old tests)

@perrotta
Copy link
Contributor

@cms-sw/pdmv-l2 this PR only misses the PDMV signature. Could you please have a look at your earliest convenience, and either approve or comment otherwise? Thank you.

@kskovpen
Copy link
Contributor

+pdmv

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-4827ab/24296/summary.html
COMMIT: c8484f9
CMSSW: CMSSW_12_4_X_2022-04-27-2300/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37134/24296/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-4827ab/24296/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-4827ab/24296/git-merge-result

Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-4827ab/10804.31_SingleGammaPt10+2018_photonDRN+SingleGammaPt10_pythia8_GenSim+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-4827ab/10805.31_SingleGammaPt35+2018_photonDRN+SingleGammaPt35_pythia8_GenSim+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3695434
  • DQMHistoTests: Total failures: 7
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3695404
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.004 KiB( 48 files compared)
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 205 log files, 45 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@perrotta
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will be automatically merged.

@cmsbuild cmsbuild merged commit ed06b7d into cms-sw:master Apr 29, 2022
@perrotta
Copy link
Contributor

@cms-sw/egamma-pog-l2, quite likely unrelated to this PR, but in the log outputs of the newly added wfs the following warning messages annoyingly show up at every event:

%MSG-w NoModule:   ByClusterSummaryMultiplicityPairEventFilter:toomanystripclus53X  28-Apr-2022 20:02:03 CEST Run: 1 Event: 1
No information for requested module 5. Please check in the Provinence Infomation for proper modules.
%MSG
%MSG-w NoModule:  ByClusterSummaryMultiplicityPairEventFilter:manystripclus53X  28-Apr-2022 20:02:03 CEST Run: 1 Event: 1
No information for requested module 5. Please check in the Provinence Infomation for proper modules.
%MSG

This is probably something that you want to look at, and fix

@mmusich
Copy link
Contributor

mmusich commented Apr 29, 2022

but in the log outputs of the newly added wfs the following warning messages annoyingly show up at every event:

@perrotta this looks like there's something wrong with how the "tracking POG" filters in:

manystripclus53X = cms.EDFilter('ByClusterSummaryMultiplicityPairEventFilter',
multiplicityConfig = cms.PSet(
firstMultiplicityConfig = cms.PSet(
clusterSummaryCollection = cms.InputTag("clusterSummaryProducer"),
subDetEnum = cms.int32(5),
varEnum = cms.int32(0)
),
secondMultiplicityConfig = cms.PSet(
clusterSummaryCollection = cms.InputTag("clusterSummaryProducer"),
subDetEnum = cms.int32(0),
varEnum = cms.int32(0)
),
),
cut = cms.string("( mult2 > 20000+7*mult1)")
)

are configured. Would you mind opening an issue so that it doesn't get lost?
I can have a look but not immediately.

@perrotta
Copy link
Contributor

A github issue has been opened in #37738, thank you @mmusich

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.