Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LowPtElectrons: fix for UL FastSim mini v2 and nano v9 workflows (back port) #35341

Merged

Conversation

bainbrid
Copy link
Contributor

PR description:

  • UL mini v2 workflows for FastSim are broken due to an exception thrown by the LowPtGsfElectronIDProducer when attempting to access an edm::Ref to the missing generalTracksBeforeMixing collection in AOD.
  • All details can be found in this issue: Mini-v2 can not be produced in FastSim UL scenario #34774.
  • For re-miniaod (and re-nanoaod) workflows, this PR also provides an acceptable workaround that allows the ID producer to evaluate the BDT model without throwing an exception, while maintaining the expected physics performance. This is achieved by extracting features from the KF track used in the ElectronSeed step (obtained from the available generalTracks collection in AOD), instead of from the KF track embedded in the GsfElectronCore object (which is chosen based on number of shared hits with the GsfTrack and obtained from the missing generalTracksBeforeMixing collection in AOD).
  • All development was performed in the 10_6_X cycle (see the compare here), tested with the example recipes provided in the issue #34774, and then ported to master (this PR).
  • Also tested is the behaviour for the nano v9 workflow, which is similarly affected.
  • This work was presented to RECO/AT here: https://indico.cern.ch/event/1071803/. The final slide compares physics performance b/w the 10_6_X and 12_1_X cycles (essentially identical).
  • Physics and computing performances are ~unaffected.

PR validation:

This PR was validated using cmsDriver commands that performed the GEN->AOD and mini v2 steps, based on those provided here, for tests with the 10_6_X cycle, and modified versions for tests with the 12_1_X release (this PR).

if this PR is a backport please specify the original PR and why you need to backport that PR:

This PR is a back port of #35181.

@slava77 @jordan-martins @crovelli

@cmsbuild cmsbuild added this to the CMSSW_10_6_X milestone Sep 20, 2021
@cmsbuild
Copy link
Contributor

A new Pull Request was created by @bainbrid for CMSSW_10_6_X.

It involves the following packages:

  • RecoEgamma/EgammaElectronProducers (reconstruction)

@jpata, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks.
@Sam-Harper, @jainshilpi, @rovere, @lgray, @sobhatta, @lecriste, @afiqaize, @wrtabb, @varuns23, @ram1123 this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@slava77
Copy link
Contributor

slava77 commented Sep 20, 2021

@cmsbuild please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-449f17/18757/summary.html
COMMIT: 6437a79
CMSSW: CMSSW_10_6_X_2021-09-19-0000/slc7_amd64_gcc700
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/35341/18757/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 119 differences found in the comparisons
  • DQMHistoTests: Total files compared: 35
  • DQMHistoTests: Total histograms compared: 3215686
  • DQMHistoTests: Total failures: 29
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3215323
  • DQMHistoTests: Total skipped: 334
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 34 files compared)
  • Checked 143 log files, 29 edm output root files, 35 DQM output files
  • TriggerResults: no differences found

@slava77
Copy link
Contributor

slava77 commented Sep 20, 2021

Reco comparison results: 119 differences found in the comparisons

I was expecting that there is no visible difference in 10_6_X in the workflows tested by the bot.

@@ -28,3 +28,9 @@
lowPtGsfElectronID,
ModelWeights = ["RecoEgamma/ElectronIdentification/data/LowPtElectrons/LowPtElectrons_ID_2021May17.root"],
)

from Configuration.Eras.Modifier_fastSim_cff import fastSim
fastSim.toModify(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that here we should be coupled with run2_miniAOD_UL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Also coupled with run2_nanoAOD_106Xv2, as done here

@cmsbuild
Copy link
Contributor

Pull request #35341 was updated. @jpata, @cmsbuild, @slava77 can you please check and sign again.

@slava77
Copy link
Contributor

slava77 commented Sep 21, 2021

@cmsbuild please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-449f17/18787/summary.html
COMMIT: 4f8e226
CMSSW: CMSSW_10_6_X_2021-09-19-0000/slc7_amd64_gcc700
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/35341/18787/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

The workflows 140.53 have different files in step1_dasquery.log than the ones found in the baseline. You may want to check and retrigger the tests if necessary. You can check it in the "files" directory in the results of the comparisons

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 1365 differences found in the comparisons
  • DQMHistoTests: Total files compared: 35
  • DQMHistoTests: Total histograms compared: 3215686
  • DQMHistoTests: Total failures: 2069
  • DQMHistoTests: Total nulls: 22
  • DQMHistoTests: Total successes: 3213261
  • DQMHistoTests: Total skipped: 334
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -56.705 KiB( 34 files compared)
  • DQMHistoSizes: changed ( 140.53 ): -44.531 KiB Hcal/DigiRunHarvesting
  • DQMHistoSizes: changed ( 140.53 ): -10.938 KiB Info/EventInfo
  • DQMHistoSizes: changed ( 140.53 ): -1.172 KiB RPC/DCSInfo
  • DQMHistoSizes: changed ( 140.53 ): -0.064 KiB SiStrip/MechanicalView
  • Checked 143 log files, 29 edm output root files, 35 DQM output files
  • TriggerResults: no differences found

@slava77
Copy link
Contributor

slava77 commented Sep 21, 2021

Reco comparison results: 1365 differences found in the comparisons

ignoring the glitch in 140.53 which accounts for 1240, the remaining 125 differences are still not expected , in particular the UL reminiAOD wf 136.88811 with 103

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-449f17/18833/summary.html
COMMIT: 0052747
CMSSW: CMSSW_10_6_X_2021-09-19-0000/slc7_amd64_gcc700
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/35341/18833/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

The workflows 140.53 have different files in step1_dasquery.log than the ones found in the baseline. You may want to check and retrigger the tests if necessary. You can check it in the "files" directory in the results of the comparisons

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 1246 differences found in the comparisons
  • DQMHistoTests: Total files compared: 35
  • DQMHistoTests: Total histograms compared: 3215686
  • DQMHistoTests: Total failures: 2042
  • DQMHistoTests: Total nulls: 22
  • DQMHistoTests: Total successes: 3213288
  • DQMHistoTests: Total skipped: 334
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -56.705 KiB( 34 files compared)
  • DQMHistoSizes: changed ( 140.53 ): -44.531 KiB Hcal/DigiRunHarvesting
  • DQMHistoSizes: changed ( 140.53 ): -10.938 KiB Info/EventInfo
  • DQMHistoSizes: changed ( 140.53 ): -1.172 KiB RPC/DCSInfo
  • DQMHistoSizes: changed ( 140.53 ): -0.064 KiB SiStrip/MechanicalView
  • Checked 143 log files, 29 edm output root files, 35 DQM output files
  • TriggerResults: no differences found

@slava77
Copy link
Contributor

slava77 commented Sep 22, 2021

Reco comparison results: 1246 differences found in the comparisons

OK now; these are all false-positives (from 140.53, mainly, which again glitched in DAS)

// Extract Track
const reco::Track* trk = nullptr;
if (useGsfToTrack_) {
trk = (*gsf2trk)[gsf].get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the master version has more checks before doing .get, in particular isAvailable is present.
@makortel does the Ref::get throw if the product is not available or will it return a nullptr?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It throws an exception if the product is not available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bainbrid
please update this with isAvailable to match the master version so that we don't potentially get more crashes here than would be expected in 12X.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I add the check within here and I do not (on purpose touch the lines here)

@cmsbuild
Copy link
Contributor

Pull request #35341 was updated. @jpata, @cmsbuild, @slava77 can you please check and sign again.

@slava77
Copy link
Contributor

slava77 commented Sep 23, 2021

@cmsbuild please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-449f17/18864/summary.html
COMMIT: 0f64359
CMSSW: CMSSW_10_6_X_2021-09-19-0000/slc7_amd64_gcc700
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/35341/18864/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

The workflows 140.53 have different files in step1_dasquery.log than the ones found in the baseline. You may want to check and retrigger the tests if necessary. You can check it in the "files" directory in the results of the comparisons

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 1242 differences found in the comparisons
  • DQMHistoTests: Total files compared: 35
  • DQMHistoTests: Total histograms compared: 3215686
  • DQMHistoTests: Total failures: 2041
  • DQMHistoTests: Total nulls: 22
  • DQMHistoTests: Total successes: 3213289
  • DQMHistoTests: Total skipped: 334
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -56.705 KiB( 34 files compared)
  • DQMHistoSizes: changed ( 140.53 ): -44.531 KiB Hcal/DigiRunHarvesting
  • DQMHistoSizes: changed ( 140.53 ): -10.938 KiB Info/EventInfo
  • DQMHistoSizes: changed ( 140.53 ): -1.172 KiB RPC/DCSInfo
  • DQMHistoSizes: changed ( 140.53 ): -0.064 KiB SiStrip/MechanicalView
  • Checked 143 log files, 29 edm output root files, 35 DQM output files
  • TriggerResults: no differences found

@slava77
Copy link
Contributor

slava77 commented Sep 23, 2021

+reconstruction

for #35341 0f64359

  • code changes are consistent with LowPtElectrons: fix for UL FastSim mini v2 and nano v9 workflows #35181 (merged a week ago)
  • jenkins tests pass and comparisons with the baseline show no relevant differences
    • the reported differences are false-positives; primarily from 140.53 which had different files in input between the baseline and this PR tests
  • I x-checked locally that the fastsim miniAOD with UL setup runs OK

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_10_6_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_12_1_X is complete. This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@slava77
Copy link
Contributor

slava77 commented Sep 23, 2021

@qliphy @perrotta
after this PR is merged please consider to build a new 10_6_X release at your earliest so that the fastsim campaigns with mini-v2 can (re)start.
Thank you.

@qliphy
Copy link
Contributor

qliphy commented Sep 24, 2021

@qliphy @perrotta
after this PR is merged please consider to build a new 10_6_X release at your earliest so that the fastsim campaigns with mini-v2 can (re)start.
Thank you.

@slava77 Sure we will.

@perrotta
Copy link
Contributor

+1

  • The new features that allow the fix for fastsim are correctly included steered by a era modifier: no changes expected for the configurations of the already started productions

@cmsbuild cmsbuild merged commit 39546b2 into cms-sw:CMSSW_10_6_X Sep 24, 2021
@sbein
Copy link
Contributor

sbein commented Oct 8, 2021

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants