Add GEN/LHE weight validation subdirectories for GEN Relval #36994

SanghyunKo · 2022-02-17T17:03:30Z

PR description:

Adding Relval subdirectories GenWeight & LHEWeight to detect odds in GEN/LHE weight contents, to avoid any potential issues that we had experienced before - such as #36705, #27918, cms-sw/cmsdist#6688, hypernews, and many other...

Each GEN/LHE subdirectory contains the number of weights, distribution of weights (normalized to the nominal), leading lepton/jet Pt & η. In addition, GEN directory contains ISR/FSR up/down (2 or 1/2) variations and their ratio to the nominal, while LHE directory contains envelop of scale variations and PDF uncertainty (± RMS) and their ratio to the nominal. Assumed 9 scale variations & 103 PDF variations, which should hold for most of the cases.

Demo of Relval plots with workflow 556 (TTbar Powheg+Pythia8) would look like this.

PR validation:

Tested with following GEN workflows:

504 (QCD Pt-30 Pythia8) - no LHE or GEN weight
555 (DY+jets aMCatNLO+Pythia8) - has LHE but no GEN (PS) weight
556 (TTbar Powheg+Pythia8) - has both LHE & GEN weights

as the routine will run for all GEN workflows, there should be no exception regardless of having LHE products or not.

cmsbuild · 2022-02-17T17:11:18Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-36994/28387

This PR adds an extra 64KB to repository
Found files with invalid states:
- Validation/EventGenerator/interface/GenPtcValidationHelper.h:
  - Added: 53773e2
  - Deleted: 2c10079
- Validation/EventGenerator/src/GenPtcValidationHelper.cc:
  - Added: 53773e2
  - Deleted: 2c10079

cmsbuild · 2022-02-17T17:11:40Z

A new Pull Request was created by @SanghyunKo (Sanghyun Ko) for master.

It involves the following packages:

Configuration/Generator (generators)
Validation/EventGenerator (dqm, generators)

@SiewYan, @mkirsano, @emanueleusai, @ahmad3213, @cmsbuild, @GurpreetSinghChahal, @jfernan2, @Saptaparna, @alberto-sanchez, @pmandrik, @pbo0, @rvenditti can you please review it and eventually sign? Thanks.
@Martin-Grunewald, @missirol, @fabiocos this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

SanghyunKo · 2022-02-17T17:18:00Z

FYI @agrohsje @Dongwoon77 @shimashimarin

jfernan2 · 2022-02-17T18:43:31Z

@SanghyunKo please add yourself along with your github username in comments in the DQM GEN Validators e-group:
https://e-groups.cern.ch/e-groups/Egroup.do?egroupName=cms-dqm-validation-developers-gen&tab=3
to keep track of the developers
Thanks

SiewYan · 2022-02-18T05:39:17Z

please test workflow 504, 555, 556

SanghyunKo · 2022-02-18T06:25:55Z

@jfernan2 Thanks for letting me know, I've added myself to the e-group.

cmsbuild · 2022-02-18T09:44:59Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2f4d7b/22493/summary.html
COMMIT: eaab8d7
CMSSW: CMSSW_12_3_X_2022-02-17-1100/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/36994/22493/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

The workflows 1001.0, 1000.0, 136.88811, 136.874, 136.8311, 136.793, 136.7611, 136.731, 4.22 have different files in step1_dasquery.log than the ones found in the baseline. You may want to check and retrigger the tests if necessary. You can check it in the "files" directory in the results of the comparisons

@slava77 comparisons for the following workflows were not done due to missing matrix map:

/data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-2f4d7b/138.4_PromptCollisions+RunMinimumBias2021+ALCARECOPROMPTR3+HARVESTDPROMPTR3
/data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-2f4d7b/138.5_ExpressCollisions+RunMinimumBias2021+TIER0EXPRUN3+ALCARECOEXPR3+HARVESTDEXPR3
/data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-2f4d7b/139.001_RunMinimumBias2021+RunMinimumBias2021+HLTDR3_2021+RECODR3_MinBiasOffline+HARVESTD2021MB
/data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-2f4d7b/504.0_QCD_Pt-30_13TeV_pythia8+QCD_Pt-30_13TeV_pythia8+HARVESTGEN
/data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-2f4d7b/555.0_DYTollJets_NLO_Mad_13TeV_py8+DYToll012Jets_5f_NLO_FXFX_Madgraph_LHE_13TeV+Hadronizer_TuneCP5_13TeV_aMCatNLO_FXFX_5f_max2j_max0p_LHE_pythia8+HARVESTGEN2
/data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-2f4d7b/556.0_TTbar_NLO_Pow_13TeV_py8+TTbar_Pow_LHE_13TeV+Hadronizer_TuneCP5_13TeV_powhegEmissionVeto2p_pythia8+HARVESTGEN2

Summary:

No significant changes to the logs found
Reco comparison results: 12 differences found in the comparisons
DQMHistoTests: Total files compared: 49
DQMHistoTests: Total histograms compared: 3965143
DQMHistoTests: Total failures: 19
DQMHistoTests: Total nulls: 1
DQMHistoTests: Total successes: 3965101
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 455.331 KiB( 48 files compared)
DQMHistoSizes: changed ( 10024.0,... ): 12.002 KiB Generator/LHEWeight
DQMHistoSizes: changed ( 10024.0,... ): 11.963 KiB Generator/GenWeight
DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
Checked 204 log files, 45 edm output root files, 49 DQM output files
TriggerResults: no differences found

jfernan2 · 2022-02-18T11:31:46Z

@SanghyunKo I understand all the histograms added:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_12_3_X_2022-02-17-1100+2f4d7b/48440/dqm-histo-comparison-summary.html
are empty since the MC WFs tested:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2f4d7b/22493/runTheMatrix-results/
are Pythia based, hence LHE weights are null or constant.

I wonder if you could add some switch in the code to only produce them when the WF is based on an external LHE generator

cmsbuild · 2022-03-07T15:00:34Z

Pull request #36994 was updated. @SiewYan, @mkirsano, @emanueleusai, @ahmad3213, @cmsbuild, @GurpreetSinghChahal, @jfernan2, @Saptaparna, @alberto-sanchez, @pmandrik, @pbo0, @rvenditti can you please check and sign again.

perrotta · 2022-03-07T15:01:35Z

please test workflow 504, 555, 556

cmsbuild · 2022-03-07T19:06:03Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2f4d7b/22898/summary.html
COMMIT: eaec248
CMSSW: CMSSW_12_3_X_2022-03-06-2300/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/36994/22898/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

/data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-2f4d7b/504.0_QCD_Pt-30_13TeV_pythia8+QCD_Pt-30_13TeV_pythia8+HARVESTGEN
/data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-2f4d7b/555.0_DYTollJets_NLO_Mad_13TeV_py8+DYToll012Jets_5f_NLO_FXFX_Madgraph_LHE_13TeV+Hadronizer_TuneCP5_13TeV_aMCatNLO_FXFX_5f_max2j_max0p_LHE_pythia8+HARVESTGEN2
/data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-2f4d7b/556.0_TTbar_NLO_Pow_13TeV_py8+TTbar_Pow_LHE_13TeV+Hadronizer_TuneCP5_13TeV_powhegEmissionVeto2p_pythia8+HARVESTGEN2

Summary:

No significant changes to the logs found
Reco comparison results: 9 differences found in the comparisons
DQMHistoTests: Total files compared: 49
DQMHistoTests: Total histograms compared: 3987741
DQMHistoTests: Total failures: 14
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3987705
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 227.297 KiB( 48 files compared)
DQMHistoSizes: changed ( 10024.0,... ): 11.963 KiB Generator/GenWeight
Checked 204 log files, 45 edm output root files, 49 DQM output files
TriggerResults: no differences found

jfernan2 · 2022-03-07T19:10:09Z

+1

Saptaparna · 2022-03-07T20:03:47Z

+1
from generators (sorry for the delay)

cmsbuild · 2022-03-07T20:25:15Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

perrotta · 2022-03-07T20:25:36Z

+1

perrotta

@SanghyunKo I was investigating the possible origin of the comparison errors that show up for wf 135.4 in the recent PR tests for the histogram made by this wgtVal_, which suggests a non reproducibility issue possibly due to some non initialized values.
In this line of the code I found a possible culprit: if I am not wrong this should have been normalized to the first weight in the vector, i.e. index 0, while if you normalize to the second element in the vector (index 1) you can get some undefinite result in case the number of elements in the vector is lower than two,
Could you please check at your earliest, and either apply this fix (if you think it is correct), or find and then implement the appropriate one? Thank you.

perrotta · 2022-03-09T08:34:52Z

Validation/EventGenerator/plugins/GenWeightValidation.cc

+  nlogWgt_->Fill(std::log10(weights_.at(idxGenEvtInfo_).size()), weight_);
+
+  for (unsigned idx = 0; idx < weights_.at(idxGenEvtInfo_).size(); idx++)
+    wgtVal_->Fill(weights_.at(idxGenEvtInfo_)[idx] / weights_.at(idxGenEvtInfo_)[1], weight_);


Shouldn't this be

wgtVal_->Fill(weights_.at(idxGenEvtInfo_)[idx] / weights_.at(idxGenEvtInfo_)[0], weight_);

instead?

@perrotta this was intended on purpose since the proper normalization of PS weight is done by dividing weights to the baseline weight, which is located at idx 1 (Twiki). But indeed I can place a protector for the case weights_.at(idxGenEvtInfo_).size()=1 between other protectors for size()=0 case and size()<=idxMax_ case

(filling weights_.at(idxGenEvtInfo_)[0]/weights_.at(idxGenEvtInfo_)[0] for size()=1 case is redundant)

if (weights_.at(idxGenEvtInfo_).size()<2) return; // no baseline weight in GenEventInfo for (unsigned idx = 0; idx < weights_.at(idxGenEvtInfo_).size(); idx++) wgtVal_->Fill(weights_.at(idxGenEvtInfo_)[idx] / weights_.at(idxGenEvtInfo_)[1], weight_);

perrotta · 2022-03-09T12:44:12Z

Thank you @SanghyunKo

Would it be possible that there is not such baseline weight, only the nominal one (i.e. weights_.at(idxGenEvtInfo_).size() = 1)?

Did you check whether this was really the origin of the supposed non-sensical entries in the wgtVal_ histo?

I think we can submit a fix PR with your suggestion, which is a reasonable one even if it does not fix the issue seen in the PR comparisons. But it wouldn't be bad if we could make some simple check to verify that it actually fixes it.

SanghyunKo · 2022-03-09T13:09:36Z

@perrotta would you mind providing some snippet or link to the failing comparison you mentioned? It's not clear to me what you're referring to... If I get it then I can definitely run a quick test for it.

As for the number of weights, there should be both nominal & baseline weight when we have PS weights, but I realized that this isn't always true when we talk about Relvals (though it is mostly true in official samples).

qliphy · 2022-03-09T13:14:51Z

@SanghyunKo For the failing comparison, you can check several recent PR tests:

https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_12_3_X_2022-03-08-2300+cdca02/48809/135.4_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS/Generator_GenWeight.html

or
#37170

https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_12_3_X_2022-03-08-2300+f3a7af/48817/135.4_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS/Generator_GenWeight.html

Dr15Jones · 2022-03-09T14:31:02Z

The ASAN build is reporting a out-of-bounds memory read coming from GenWeightValidation::analyze

=================================================================
==17313==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000164e98 at pc 0x2aaf37f96ed5 bp 0x2aaf3dc2f990 sp 0x2aaf3dc2f988
READ of size 8 at 0x602000164e98 thread T3
    #0 0x2aaf37f96ed4 in GenWeightValidation::analyze(edm::Event const&, edm::EventSetup const&) (/cvmfs/cms-ib.cern.ch/nweek-02723/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_3_ASAN_X_2022-03-09-1100/lib/slc7_amd64_gcc11/pluginValidationEventGenerator_plugins.so+0xffed4)
    #1 0x2aaef9168d47 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) (/cvmfs/cms-ib.cern.ch/nweek-02723/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_3_ASAN_X_2022-03-09-1100/lib/slc7_amd64_gcc11/libFWCoreFramework.so+0x8cfd47)
[cut]

0x602000164e98 is located 0 bytes to the right of 8-byte region [0x602000164e90,0x602000164e98)
allocated by thread T3 here:
    #0 0x2aaef7f77d07 in operator new(unsigned long) ../../../../libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x2aaef9fdf1ac in void std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > >::_M_realloc_insert<std::vector<double, std::allocator<double> > const&>(__gnu_cxx::__normal_iterator<std::vector<double, std::allocator<double> >*, std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > > >, std::vector<double, std::allocator<double> > const&) (/cvmfs/cms-ib.cern.ch/nweek-02723/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_3_ASAN_X_2022-03-09-1100/external/slc7_amd64_gcc11/lib/libMathCore.so+0xbc1ac)

SUMMARY: AddressSanitizer: heap-buffer-overflow (/cvmfs/cms-ib.cern.ch/nweek-02723/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_3_ASAN_X_2022-03-09-1100/lib/slc7_amd64_gcc11/pluginValidationEventGenerator_plugins.so+0xffed4) in GenWeightValidation::analyze(edm::Event const&, edm::EventSetup const&)
Shadow bytes around the buggy address:
  0x0c0480024980: fa fa 00 fa fa fa fd fa fa fa fa fa fa fa fd fa
  0x0c0480024990: fa fa 00 00 fa fa 00 00 fa fa fd fd fa fa fd fd
  0x0c04800249a0: fa fa fd fa fa fa 00 fa fa fa 00 fa fa fa fd fa
  0x0c04800249b0: fa fa fd fd fa fa fd fa fa fa fd fd fa fa 00 00
  0x0c04800249c0: fa fa fd fd fa fa fd fa fa fa fd fa fa fa fd fd
=>0x0c04800249d0: fa fa 00[fa]fa fa fd fa fa fa fd fd fa fa fd fd
  0x0c04800249e0: fa fa fd fa fa fa fd fd fa fa fd fa fa fa fd fd
  0x0c04800249f0: fa fa fd fd fa fa 00 fa fa fa 00 00 fa fa 00 00
  0x0c0480024a00: fa fa fd fd fa fa fd fd fa fa fa fa fa fa fd fd
  0x0c0480024a10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0480024a20: fa fa 00 fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==17313==ABORTING

perrotta · 2022-03-09T14:48:24Z

Thank you @Dr15Jones : your observation is perfectly in line with what was discussed earlier on in this thread.

Since @SanghyunKo already proposed a fix, but such a fix was not submitted yet, I shamelessly copied the very same solution proposed by @SanghyunKo in a new PR, #37185, in order to speed up the possible integration in CMSSW in time for 12_3_0_pre6.

Of course, if @SanghyunKo or anyone else finds a more appropriate solution, that PR can be closed and we can move to the new one.

SanghyunKo · 2022-03-09T14:56:58Z

Thanks @perrotta and no problem at all, it's my bad. I was struggling to reproduce the failing DQM comparison (but I couldn't) but the address sanitizer is telling us the fix is needed anyway indeed.

SanghyunKo and others added 7 commits November 11, 2019 19:45

DQM GEN/LHE weights validation

53773e2

fix merge conflict

7cf1a43

remove process dependent plots

2c10079

protect from no LHE product or lack of number of weights

4107ff5

add PS weights for aMCatNLO TTbar Relval

4e5c860

increase nJet bins

3553ce2

scram code-format

eaab8d7

cmsbuild added this to the CMSSW_12_3_X milestone Feb 17, 2022

cmsbuild added code-checks-pending dqm-pending generators-pending orp-pending pending-signatures tests-pending labels Feb 17, 2022

cmsbuild added code-checks-approved and removed code-checks-pending labels Feb 17, 2022

cmsbuild added tests-started and removed tests-pending labels Feb 18, 2022

cmsbuild added tests-approved and removed tests-started labels Feb 18, 2022

SanghyunKo added 2 commits February 19, 2022 06:37

prevent running LHE validation for samples that have no LHE product

1a5d32c

improve code formats for private members

0c8775a

cmsbuild added tests-started and removed tests-pending labels Mar 7, 2022

cmsbuild added tests-approved and removed tests-started labels Mar 7, 2022

cmsbuild added dqm-approved and removed dqm-pending labels Mar 7, 2022

cmsbuild added fully-signed generators-approved and removed generators-pending pending-signatures labels Mar 7, 2022

cmsbuild added orp-approved and removed orp-pending labels Mar 7, 2022

cmsbuild merged commit 996a3b2 into cms-sw:master Mar 7, 2022

perrotta mentioned this pull request Mar 8, 2022

SIGABRT in HepMCValidationHelper::removeIsolatedLeptons #37169

Open

perrotta reviewed Mar 9, 2022

View reviewed changes

perrotta mentioned this pull request Mar 9, 2022

Tighten check for the number of weights in GenWeightValidation #37185

Merged

SanghyunKo mentioned this pull request May 12, 2022

134 PS weights instead of 46 in Pythia 8.3 #36705

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GEN/LHE weight validation subdirectories for GEN Relval #36994

Add GEN/LHE weight validation subdirectories for GEN Relval #36994

SanghyunKo commented Feb 17, 2022

cmsbuild commented Feb 17, 2022

cmsbuild commented Feb 17, 2022

SanghyunKo commented Feb 17, 2022

jfernan2 commented Feb 17, 2022

SiewYan commented Feb 18, 2022

SanghyunKo commented Feb 18, 2022

cmsbuild commented Feb 18, 2022

jfernan2 commented Feb 18, 2022

cmsbuild commented Mar 7, 2022

perrotta commented Mar 7, 2022

cmsbuild commented Mar 7, 2022

jfernan2 commented Mar 7, 2022

Saptaparna commented Mar 7, 2022 •

edited by perrotta

Loading

cmsbuild commented Mar 7, 2022

perrotta commented Mar 7, 2022

perrotta left a comment

perrotta Mar 9, 2022

SanghyunKo Mar 9, 2022

perrotta commented Mar 9, 2022

SanghyunKo commented Mar 9, 2022

qliphy commented Mar 9, 2022 •

edited

Loading

Dr15Jones commented Mar 9, 2022

perrotta commented Mar 9, 2022 •

edited

Loading

SanghyunKo commented Mar 9, 2022

Add GEN/LHE weight validation subdirectories for GEN Relval #36994

Add GEN/LHE weight validation subdirectories for GEN Relval #36994

Conversation

SanghyunKo commented Feb 17, 2022

PR description:

PR validation:

cmsbuild commented Feb 17, 2022

cmsbuild commented Feb 17, 2022

SanghyunKo commented Feb 17, 2022

jfernan2 commented Feb 17, 2022

SiewYan commented Feb 18, 2022

SanghyunKo commented Feb 18, 2022

cmsbuild commented Feb 18, 2022

Comparison Summary

jfernan2 commented Feb 18, 2022

cmsbuild commented Mar 7, 2022

perrotta commented Mar 7, 2022

cmsbuild commented Mar 7, 2022

Comparison Summary

jfernan2 commented Mar 7, 2022

Saptaparna commented Mar 7, 2022 • edited by perrotta Loading

cmsbuild commented Mar 7, 2022

perrotta commented Mar 7, 2022

perrotta left a comment

Choose a reason for hiding this comment

perrotta Mar 9, 2022

Choose a reason for hiding this comment

SanghyunKo Mar 9, 2022

Choose a reason for hiding this comment

perrotta commented Mar 9, 2022

SanghyunKo commented Mar 9, 2022

qliphy commented Mar 9, 2022 • edited Loading

Dr15Jones commented Mar 9, 2022

perrotta commented Mar 9, 2022 • edited Loading

SanghyunKo commented Mar 9, 2022

Saptaparna commented Mar 7, 2022 •

edited by perrotta

Loading

qliphy commented Mar 9, 2022 •

edited

Loading

perrotta commented Mar 9, 2022 •

edited

Loading