Adapt to new range for BDT output for Run3 Saturated regression #37100

swagata87 · 2022-03-01T14:29:27Z

PR description:

This PR changes the range of BDT output of saturated e/gamma regression.

Last time when saturated e/gamma regression was trained (several years ago during Run2), the BDT output range for correction factors was -1 to 3. We have now retrained the regression for Run3, and we have used range 0.2 to 2.0 while training. This is because,

the normal(as in, non-saturated) e/gamma regression currently use 0.2-2.0 range, so this way everything becomes aligned.
Also not sure why in the past the range started from negative values of correction factors. The current range makes more sense.

Since the range was changed in training, a change in CMSSW is also needed. It is done using era modifier, so that Run2 UL still have the old range [-1,3] compatible with old training, and Run3 gets the new range [0.2,2] compatible with new training. The GT update for regression tags should go into the same release as well. That is progressing in parallel.

Related github issue: cms-sw/cmssw#36886
Talk describing the issues that were fixed + the outstanding issues: https://indico.cern.ch/event/1127851/contributions/4733937/attachments/2390919/4087216/EGM_RegressionTags_Changes.pdf

PR validation:

From CMSSW_12_3_0_pre5, merged this branch and ran the AOD step[1] on this dataset[2] using updated conditions[3] and then ran the miniAOD step[4] on the output AOD. Then made response plots using genmatched slimmedElectrons, and compared that with Run2 regression available in old releases and old GTs. Response looks reasonable and as expected.

[1] cmsDriver.py --filein file:/eos/cms/blah.root --fileout file:AOD.root --mc --eventcontent AODSIM --runUnscheduled --customise Configuration/DataProcessing/Utils.addMonitoring --datatier AODSIM --conditions 123X_mcRun3_2021_realistic_Candidate_2022_03_01_08_47_07 --step RAW2DIGI,RECO --geometry DB:Extended --era Run3 --python_filename aod_cfg.py --beamspot Run3RoundOptics25ns13TeVLowSigmaZ --no_exec -n -1

[2] /RelValZpToEE_m6000_14TeV/CMSSW_12_3_0_pre5-123X_mcRun3_2021_realistic_v6-v1/GEN-SIM-DIGI-RAW

[3] https://cms-conddb.cern.ch/cmsDbBrowser/diff/Prod/gts/123X_mcRun3_2021_realistic_Candidate_2022_03_01_08_47_07/123X_mcRun3_2021_realistic_v9

[4] cmsDriver.py --filein file:/eos/cms/blah.root --fileout file:Mini.root --mc --eventcontent MINIAODSIM --runUnscheduled --customise Configuration/DataProcessing/Utils.addMonitoring --datatier MINIAODSIM --conditions 123X_mcRun3_2021_realistic_Candidate_2022_03_01_08_47_07 --step PAT --geometry DB:Extended --era Run3 --python_filename miniaod_cfg.py --beamspot Run3RoundOptics25ns13TeVLowSigmaZ --no_exec -n -1

This PR is not a backport.
A backport to 12_2 is not meaningful anymore, as 12_2 MC production for 0.5B POG samples has already started. So most probably we won't do any backport of this PR.

cmsbuild · 2022-03-01T14:39:08Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37100/28595

This PR adds an extra 16KB to repository

cmsbuild · 2022-03-01T14:39:31Z

A new Pull Request was created by @swagata87 (Swagata Mukherjee) for master.

It involves the following packages:

RecoEgamma/EgammaTools (reconstruction)

@jpata, @cmsbuild, @clacaputo, @slava77 can you please review it and eventually sign? Thanks.
@Sam-Harper, @jainshilpi, @valsdav, @lgray, @sobhatta, @afiqaize, @wrtabb, @varuns23, @ram1123 this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

tvami · 2022-03-01T14:39:44Z

urgent

jpata · 2022-03-01T14:57:49Z

Per my understanding, it's better if the default is changed directly for Run3 and onwards, rather than applying it with a modifier.

swagata87 · 2022-03-01T15:50:20Z

I was thinking about the same, but from this comment[1] inside the config it seemed like there was some reason to do it like this, so I used a modifier for now. Hope its fine for this urgent purpose, and we can improve in later release.. I wanted to figure out / better understand why it's done this way, but might need some time..

[1]

cmssw/RecoEgamma/EgammaTools/python/regressionModifier_cfi.py

Lines 238 to 240 in 6558221

    
           #by default we use the regression inappropriate to the main purpose of this release 
        
           #life is simplier that way 
        
           regressionModifier = regressionModifier94X.clone()

jpata · 2022-03-01T16:19:16Z

the release is next week, so you have some time still to address it. I'm not able to understand what the comment you referenced above means - can you unpack it?

jpata · 2022-03-01T17:06:22Z

nevermind my comment above, it was clarified in the ORP mattermost. sorry for the confusion.

generally in reco, we try to keep the configurations for different eras as consistent as possible. see e.g. [MiniAOD] Lower AK4 Puppi jets pT cut for Run 3 #36890 (comment). Explicitly, there is no commitment that a new release will reproduce the old behaviour / cuts of e.g. Run2 by default.
in this particular case, since this new egamma model was never trained for Run2, clearly one cannot apply the new ranges by default also in run2
as was noted, and I concur, that the following logic run3_common.toModify(regressionModifier106XUL seems rather confusing, and the code comment by default we use the regression inappropriate to the main purpose of this release, life is simplier that way does not help.

jpata · 2022-03-01T17:08:28Z

@cmsbuild please test

jpata · 2022-03-01T17:16:38Z

@tvami for the "urgent", can you clarify the reasoning, and if it differs from the 12_3_0_pre6 timescale of next week?

tvami · 2022-03-01T17:24:32Z

Hi @jpata

can you clarify the reasoning, and if it differs from the 12_3_0_pre6 timescale of next week?

this all should actually come with a GT change as well, I'm working on it and will submit the PR soon (condition request came in during ORP).

Further urgency comes into the picture that the JEC is to be updated as well (but it was not yet requested from the JetMet group), i.e. we need another GT change after the one that I'm working on. If this PR merges only on next Tuesday, we'll be late with the follow-up PR to this.

I think testing as it is might not have the outcome we are looking for, since all the private testing was done with the new conditions.

swagata87 · 2022-03-01T17:27:22Z

thanks for clarifying, Joosep.
After the jenkins test finish, I will try to improve this part -> run3_common.toModify(regressionModifier106XUL -> which is confusing.
One option is to add another block for Run3, like this:

regressionModifierRun3 = cms.PSet(
    modifierName = cms.string('EGRegressionModifierV3'),  
    .....

This way it would be cleaner, less confusing, but more number of lines to be added compared to what it is now. I guess that's okay, for the sake of clarity..

tvami · 2022-03-01T18:09:11Z

I'm working on it and will submit the PR soon (condition request came in during ORP).

Here is the PR
#37102

cmsbuild · 2022-03-01T21:06:28Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-033012/22743/summary.html
COMMIT: 8f685bd
CMSSW: CMSSW_12_3_X_2022-03-01-1100/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37100/22743/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

The workflows 1001.0, 1000.0, 136.88811, 136.874, 136.8311, 136.793, 136.7611, 136.731, 4.22 have different files in step1_dasquery.log than the ones found in the baseline. You may want to check and retrigger the tests if necessary. You can check it in the "files" directory in the results of the comparisons

Summary:

No significant changes to the logs found
Reco comparison results: 518 differences found in the comparisons
DQMHistoTests: Total files compared: 49
DQMHistoTests: Total histograms compared: 4000857
DQMHistoTests: Total failures: 95
DQMHistoTests: Total nulls: 1
DQMHistoTests: Total successes: 4000739
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: -0.004 KiB( 48 files compared)
DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
Checked 204 log files, 45 edm output root files, 49 DQM output files
TriggerResults: no differences found

perrotta · 2022-03-01T22:20:56Z

RecoEgamma/EgammaTools/python/regressionModifier_cfi.py

 (run2_egamma_2016 | run2_egamma_2017 | run2_egamma_2018).toReplaceWith(regressionModifier,regressionModifier106XUL)

+from Configuration.Eras.Modifier_run3_common_cff import run3_common
+run3_common.toModify(regressionModifier106XUL.eleRegs.ecalOnlyMean,
+      rangeMinHighEt = 0.2,
+      rangeMaxHighEt = 2.0
+)
+
+run3_common.toModify(regressionModifier106XUL.phoRegs.ecalOnlyMean,
+      rangeMinHighEt = 0.2,
+      rangeMaxHighEt = 2.0
+)
+


Could something similar also work? (Please check before implementing, if you decide so)

Suggested change

(run2_egamma_2016 | run2_egamma_2017 | run2_egamma_2018).toReplaceWith(regressionModifier,regressionModifier106XUL)

from Configuration.Eras.Modifier_run3_common_cff import run3_common

run3_common.toModify(regressionModifier106XUL.eleRegs.ecalOnlyMean,

rangeMinHighEt = 0.2,

rangeMaxHighEt = 2.0

)

run3_common.toModify(regressionModifier106XUL.phoRegs.ecalOnlyMean,

rangeMinHighEt = 0.2,

rangeMaxHighEt = 2.0

)

regressionModifierRun2 = regressionModifier106XUL.clone()

from Configuration.Eras.Modifier_run2_egamma_2016_cff import run2_egamma_2016

from Configuration.Eras.Modifier_run2_egamma_2017_cff import run2_egamma_2017

from Configuration.Eras.Modifier_run2_egamma_2018_cff import run2_egamma_2018

(run2_egamma_2016 | run2_egamma_2017 | run2_egamma_2018).toReplaceWith(regressionModifier,regressionModifierRun2)

regressionModifierRun3 = regressionModifierRun2.clone(

eleRegs = dict(

ecalOnlyMean = dict(

rangeMinHighEt = 0.2,

rangeMaxHighEt = 2.0

)

),

phoRegs = dict(

ecalOnlyMean = dict(

rangeMinHighEt = 0.2,

rangeMaxHighEt = 2.0

)

)

)

from Configuration.Eras.Modifier_run3_common_cff import run3_common

run3_common.toReplaceWith(regressionModifier,regressionModifierRun3

what's wrong with the original? except for the fact that there we should just have run3_common.toModify(regressionModifier.phoRegs.ecalOnlyMean

the original one, ie regressionModifier clones regressionModifier94X,

cmssw/RecoEgamma/EgammaTools/python/regressionModifier_cfi.py

Line 240 in 6558221

regressionModifier = regressionModifier94X.clone()

and regressionModifier94X is incompatible with Run3 as it uses EGRegressionModifierV2, while we need EGRegressionModifierV3.

cmssw/RecoEgamma/EgammaTools/python/regressionModifier_cfi.py

Lines 183 to 185 in 6558221

regressionModifier94X = \

cms.PSet( modifierName = cms.string('EGRegressionModifierV2'),

but it's already replaced at that point in the line above
(run2_egamma_2016 | run2_egamma_2017 | run2_egamma_2018).toReplaceWith(regressionModifier,regressionModifier106XUL)

(run2_egamma_2018 is active in the Run3 era)

but it's already replaced at that point in the line above
(run2_egamma_2016 | run2_egamma_2017 | run2_egamma_2018).toReplaceWith(regressionModifier,regressionModifier106XUL)

to be more obvious that line could be updated to be
(run2_egamma_2016 | run2_egamma_2017 | run2_egamma_2018 | run3_common).toReplaceWith ...

(my other comment about introducing a run3_egamma instead still applies; although that one is probably more for convenience of finding egamma-related changes)

thanks for explaining, Slava. I gave it a try[1].
so the way I originally implemented it, and the way you suggested, both have an unintended effect (discussed here also #37102 (comment)), that new parameter values gets propagated to low pT electron regression too. (I checked it by doing edmConfigDump after implementing [1]).

So now I will try the way Andrea suggested. And I will also try to add run3_egamma as per your suggestion.

[1]

(run2_egamma_2016 | run2_egamma_2017 | run2_egamma_2018 | run3_common).toReplaceWith(regressionModifier,regressionModifier106XUL) run3_common.toModify(regressionModifier.eleRegs.ecalOnlyMean, rangeMinHighEt = 0.2, rangeMaxHighEt = 2.0 ) run3_common.toModify(regressionModifier.phoRegs.ecalOnlyMean, rangeMinHighEt = 0.2, rangeMaxHighEt = 2.0 )

both have an unintended effect (discussed here also #37102 (comment)), that new parameter values gets propagated to low pT electron regression too.

uhm, what about
(run2_egamma_2016 | run2_egamma_2017 | run2_egamma_2018 | run3_common).toReplaceWith(regressionModifier,regressionModifier106XUL.clone()) ?

@slava77 I don't think it can work.
For Run3 the regressionModifier has to be further modified: it cannot be the same as for Run2.

slava77 · 2022-03-01T23:37:16Z

RecoEgamma/EgammaTools/python/regressionModifier_cfi.py

 (run2_egamma_2016 | run2_egamma_2017 | run2_egamma_2018).toReplaceWith(regressionModifier,regressionModifier106XUL)

+from Configuration.Eras.Modifier_run3_common_cff import run3_common
+run3_common.toModify(regressionModifier106XUL.eleRegs.ecalOnlyMean,


should there be a run3_egamma instead, just to follow the old patterns?

swagata87 · 2022-03-02T14:01:24Z

Added run3_egamma era modifier, fixed the issue discussed in #37102 (comment).
Tested on 2000 Z'->EE events from AOD using gedGsfElectrons collection.
Response look as expected.

cmsbuild · 2022-03-02T14:04:17Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37100/28630

This PR adds an extra 20KB to repository

cmsbuild · 2022-03-02T14:04:43Z

Pull request #37100 was updated. @perrotta, @clacaputo, @cmsbuild, @slava77, @jpata, @qliphy, @fabiocos, @davidlange6 can you please check and sign again.

tvami · 2022-03-02T14:14:09Z

@cmsbuild , please test with #37102

cmsbuild · 2022-03-02T19:01:19Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-033012/22771/summary.html
COMMIT: 10660de
CMSSW: CMSSW_12_3_X_2022-03-01-2300/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37100/22771/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 8 differences found in the comparisons
DQMHistoTests: Total files compared: 49
DQMHistoTests: Total histograms compared: 4000857
DQMHistoTests: Total failures: 13
DQMHistoTests: Total nulls: 1
DQMHistoTests: Total successes: 4000821
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: -0.004 KiB( 48 files compared)
DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
Checked 204 log files, 45 edm output root files, 49 DQM output files
TriggerResults: no differences found

jpata · 2022-03-03T12:32:59Z

+reconstruction

francescobrivio · 2022-03-03T12:48:53Z

@cms-sw/orp-l2 do you agree with the latest proposed solution? Both tests here and in #37102 are clean now.

(Sorry for this ping, I just wanted to point out that this should be merged together with #37102).

perrotta · 2022-03-03T14:19:23Z

+1

cmsbuild · 2022-03-03T14:19:49Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will be automatically merged.

change range for Run3 Saturated regression

8f685bd

cmsbuild added this to the CMSSW_12_3_X milestone Mar 1, 2022

cmsbuild added code-checks-pending orp-pending pending-signatures reconstruction-pending tests-pending labels Mar 1, 2022

cmsbuild added code-checks-approved and removed code-checks-pending labels Mar 1, 2022

cmsbuild added the urgent label Mar 1, 2022

cmsbuild added tests-started and removed tests-pending labels Mar 1, 2022

tvami mentioned this pull request Mar 1, 2022

Add saturated EGM regression tags to Run-3 MC GTs #37102

Merged

cmsbuild added tests-approved and removed tests-started labels Mar 1, 2022

perrotta reviewed Mar 1, 2022

View reviewed changes

slava77 reviewed Mar 1, 2022

View reviewed changes

cmsbuild added code-checks-pending operations-pending tests-pending and removed code-checks-approved labels Mar 2, 2022

cmsbuild added code-checks-approved and removed code-checks-pending labels Mar 2, 2022

cmsbuild added requires-external tests-started and removed tests-pending labels Mar 2, 2022

cmsbuild added tests-approved and removed tests-started labels Mar 2, 2022

cmsbuild added reconstruction-approved and removed reconstruction-pending labels Mar 3, 2022

cmsbuild added fully-signed operations-approved orp-approved and removed operations-pending pending-signatures orp-pending labels Mar 3, 2022

cmsbuild merged commit 238b35f into cms-sw:master Mar 3, 2022

cmsbuild mentioned this pull request Mar 3, 2022

Add AlcaPCCIntegrator modules to the AlCa no ConcurrentLumis list #37130

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt to new range for BDT output for Run3 Saturated regression #37100

Adapt to new range for BDT output for Run3 Saturated regression #37100

swagata87 commented Mar 1, 2022

cmsbuild commented Mar 1, 2022

cmsbuild commented Mar 1, 2022

tvami commented Mar 1, 2022

jpata commented Mar 1, 2022

swagata87 commented Mar 1, 2022 •

edited

Loading

jpata commented Mar 1, 2022 •

edited

Loading

jpata commented Mar 1, 2022 •

edited

Loading

jpata commented Mar 1, 2022

jpata commented Mar 1, 2022

tvami commented Mar 1, 2022

swagata87 commented Mar 1, 2022

tvami commented Mar 1, 2022

cmsbuild commented Mar 1, 2022

perrotta Mar 1, 2022

slava77 Mar 1, 2022

swagata87 Mar 1, 2022

slava77 Mar 2, 2022

slava77 Mar 2, 2022

swagata87 Mar 2, 2022

slava77 Mar 2, 2022

perrotta Mar 2, 2022

slava77 Mar 1, 2022

swagata87 commented Mar 2, 2022 •

edited

Loading

cmsbuild commented Mar 2, 2022

cmsbuild commented Mar 2, 2022

tvami commented Mar 2, 2022

cmsbuild commented Mar 2, 2022

jpata commented Mar 3, 2022

francescobrivio commented Mar 3, 2022

perrotta commented Mar 3, 2022

cmsbuild commented Mar 3, 2022

	regressionModifier94X = \
	cms.PSet( modifierName = cms.string('EGRegressionModifierV2'),

Adapt to new range for BDT output for Run3 Saturated regression #37100

Adapt to new range for BDT output for Run3 Saturated regression #37100

Conversation

swagata87 commented Mar 1, 2022

PR description:

PR validation:

cmsbuild commented Mar 1, 2022

cmsbuild commented Mar 1, 2022

tvami commented Mar 1, 2022

jpata commented Mar 1, 2022

swagata87 commented Mar 1, 2022 • edited Loading

jpata commented Mar 1, 2022 • edited Loading

jpata commented Mar 1, 2022 • edited Loading

jpata commented Mar 1, 2022

jpata commented Mar 1, 2022

tvami commented Mar 1, 2022

swagata87 commented Mar 1, 2022

tvami commented Mar 1, 2022

cmsbuild commented Mar 1, 2022

Comparison Summary

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swagata87 commented Mar 2, 2022 • edited Loading

cmsbuild commented Mar 2, 2022

cmsbuild commented Mar 2, 2022

tvami commented Mar 2, 2022

cmsbuild commented Mar 2, 2022

Comparison Summary

jpata commented Mar 3, 2022

francescobrivio commented Mar 3, 2022

perrotta commented Mar 3, 2022

cmsbuild commented Mar 3, 2022

swagata87 commented Mar 1, 2022 •

edited

Loading

jpata commented Mar 1, 2022 •

edited

Loading

jpata commented Mar 1, 2022 •

edited

Loading

swagata87 commented Mar 2, 2022 •

edited

Loading