Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding PAIReDJet Table to NanoAOD #45207

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

tajrussell
Copy link

@tajrussell tajrussell commented Jun 12, 2024

PR description:

Adding PAIReD jets (https://indico.cern.ch/event/1372046/contributions/5993835/attachments/2874191/5033020/BTV_PAIReD.pdf) currently intended mainly for use in the resolved regime of heavy particle decays. The jets are added as a new flat table in NanoAOD that stores the indices of the seed jets as well as the bb, cc, and ll tagging scores from a version of parT that has been retrained on PAIReD jets. The number of b and c hadrons within the PAIReD ellipse is also stored for events that are not real data.

The main file is RecoBTag/ONNXRuntime/plugins/PAIReDONNXJetTagsProducer.cc which loops over all pairs of AK4 jets in an event and finds the candidates in the ellipse defined by the two jets. This makes use of the inEllipse() function in the newly defined file RecoBTag/FeatureTools/interface/paired_helper.h. The Producer then organizes the inputs for parT and runs on a new ONNX model that needs to be simultaneously added to RecoBTag/Combined/data to produce the bb, cc, and ll tagging scores (cms-data/RecoBTag-Combined#58).

The RecoBTag/ONNXRuntime/plugins BuildFile is updated to include a number of dependencies required for the new EDProducer.

PhysicsTools/NanoAOD/python/jetsPAIReD.py defines tasks for running the new producer on data and MC, and PhysicsTools/NanoAOD/python/nano_cff.py is edited to include these tasks in NanoAOD production.

PR validation:

We have validated that the scores produced by running the producer being added to CMSSW in this PR match those produced by the training framework which uses PFNano samples. This can be seen on slide 26 of this presentation (https://indico.cern.ch/event/1347445/contributions/5857902/attachments/2866917/5018410/Higgs-Charm%20Workshop.pdf)

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 12, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45207/40557

  • This PR adds an extra 28KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45207/40558

  • This PR adds an extra 32KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @tajrussell for master.

It involves the following packages:

  • PhysicsTools/NanoAOD (xpog)
  • RecoBTag/FeatureTools (reconstruction)
  • RecoBTag/ONNXRuntime (reconstruction)

@vlimant, @hqucms, @ftorrresd, @jfernan2, @mandrenguyen, @cmsbuild can you please review it and eventually sign? Thanks.
@demuller, @andrzejnovak, @AlexDeMoor, @Ming-Yan, @Senphy, @hqucms, @AnnikaStein, @castaned, @missirol, @gpetruc this is something you requested to watch as well.
@rappoccio, @sextonkennedy, @antoniovilela you are the release manager for this.

cms-bot commands are listed here

@vlimant
Copy link
Contributor

vlimant commented Jun 13, 2024

enable nano

desc.add<edm::InputTag>("vertices", edm::InputTag("offlineSlimmedPrimaryVertices"));
desc.add<edm::InputTag>("secondary_vertices", edm::InputTag("slimmedSecondaryVertices"));
desc.add<edm::FileInPath>("model_path", edm::FileInPath("RecoBTag/Combined/data/PAIReD/model3.onnx"));
descriptions.add("PAIReDJetTable", desc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to use descriptions.addWithDefaultLabel(desc) to generate the default cfi

import FWCore.ParameterSet.Config as cms
from PhysicsTools.NanoAOD.common_cff import *

pairedJetTable = cms.EDProducer("PAIReDONNXJetTagsProducer")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you are using all default values from the fillDescription, which. is actually good to have. but I'd rather, so that the actual parameters are visible (and modifiable) in the configuration, that you import from the default cfi generated from the fillDescription (See below)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean adding a file similar to this one https://cmssdt.cern.ch/lxr/source/cfipython/RecoBTag/ONNXRuntime/UnifiedParticleTransformerAK4ONNXJetTagsProducer.py that specifies the descriptions outside of the producer? Or if not is there by any chance a different piece of code you could point me to that accomplishes something similar to what you would like to see here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideConfigurationValidationAndHelp for documentation. you do not have to add the file in cfipython yourself, the fwk will do this for your upon "scram build". You should be able to do something like

from RecoBTag.ONNXRuntime.pAIRrDONNXJetTagsProducer_cfi import pAIRrDONNXJetTagsProducer
pairedJetTable = pAIRrDONNXJetTagsProducer.clone()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from RecoBTag.ONNXRuntime.pAIRrDONNXJetTagsProducer_cfi import pAIRrDONNXJetTagsProducer is what descriptions.addWithDefaultLabel(desc); provides

gen_particle_token_(consumes<edm::View<reco::GenParticle>>(iConfig.getParameter<edm::InputTag>("gen_particles"))),
vtx_token_(consumes<VertexCollection>(iConfig.getParameter<edm::InputTag>("vertices"))),
sv_token_(consumes<SVCollection>(iConfig.getParameter<edm::InputTag>("secondary_vertices"))) {
produces<nanoaod::FlatTable>(name_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of a monolythic producer that evaluates AND create the table, can we separate the two : produce the tag in the edm (as a value map?) and pick this up with a standard table producer (preferably from an existing module type if possible) ?

@vlimant
Copy link
Contributor

vlimant commented Jun 13, 2024

type btv

@cmsbuild cmsbuild added the btv label Jun 13, 2024
@vlimant
Copy link
Contributor

vlimant commented Jun 24, 2024

please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: RelVals RelVals-INPUT RelVals-NANO
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c1ad35/40025/summary.html
COMMIT: 75b75e0
CMSSW: CMSSW_14_1_X_2024-06-23-2300/el8_amd64_gcc12
Additional Tests: NANO
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45207/40025/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals

----- Begin Fatal Exception 24-Jun-2024 10:37:42 CEST-----------------------
An exception of category 'FileInPathError' occurred while
   [0] Constructing the EventProcessor
Exception Message:
edm::FileInPath unable to find file RecoBTag/Combined/data/PAIReD/model3.onnx anywhere in the search path.
The search path is defined by: CMSSW_SEARCH_PATH
${CMSSW_SEARCH_PATH} is: /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45207/40025/CMSSW_14_1_X_2024-06-23-2300/poison:/cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45207/40025/CMSSW_14_1_X_2024-06-23-2300/src:/cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45207/40025/CMSSW_14_1_X_2024-06-23-2300/external/el8_amd64_gcc12/data:/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02843/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_1_X_2024-06-23-2300/poison:/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02843/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_1_X_2024-06-23-2300/src:/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02843/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_1_X_2024-06-23-2300/external/el8_amd64_gcc12/data
Current directory is: /data/cmsbld/jenkins/workspace/ib-run-pr-relvals/matrix-results/140.023_RunZeroBias2022B
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 24-Jun-2024 10:37:56 CEST-----------------------
An exception of category 'FileInPathError' occurred while
   [0] Constructing the EventProcessor
Exception Message:
edm::FileInPath unable to find file RecoBTag/Combined/data/PAIReD/model3.onnx anywhere in the search path.
The search path is defined by: CMSSW_SEARCH_PATH
${CMSSW_SEARCH_PATH} is: /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45207/40025/CMSSW_14_1_X_2024-06-23-2300/poison:/cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45207/40025/CMSSW_14_1_X_2024-06-23-2300/src:/cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45207/40025/CMSSW_14_1_X_2024-06-23-2300/external/el8_amd64_gcc12/data:/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02843/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_1_X_2024-06-23-2300/poison:/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02843/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_1_X_2024-06-23-2300/src:/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02843/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_1_X_2024-06-23-2300/external/el8_amd64_gcc12/data
Current directory is: /data/cmsbld/jenkins/workspace/ib-run-pr-relvals/matrix-results/135.4_ZEEFS_13
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 24-Jun-2024 10:40:43 CEST-----------------------
An exception of category 'FileInPathError' occurred while
   [0] Constructing the EventProcessor
Exception Message:
edm::FileInPath unable to find file RecoBTag/Combined/data/PAIReD/model3.onnx anywhere in the search path.
The search path is defined by: CMSSW_SEARCH_PATH
${CMSSW_SEARCH_PATH} is: /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45207/40025/CMSSW_14_1_X_2024-06-23-2300/poison:/cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45207/40025/CMSSW_14_1_X_2024-06-23-2300/src:/cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45207/40025/CMSSW_14_1_X_2024-06-23-2300/external/el8_amd64_gcc12/data:/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02843/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_1_X_2024-06-23-2300/poison:/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02843/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_1_X_2024-06-23-2300/src:/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02843/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_1_X_2024-06-23-2300/external/el8_amd64_gcc12/data
Current directory is: /data/cmsbld/jenkins/workspace/ib-run-pr-relvals/matrix-results/140.043_RunZeroBias2022C
----- End Fatal Exception -------------------------------------------------
Expand to see more relval errors ...

RelVals-INPUT

  • 136.72412136.72412_RunJetHT2016B_reminiaodUL/step2_RunJetHT2016B_reminiaodUL.log
  • 140.202140.202_RunJetMET2022D_reMINI/step2_RunJetMET2022D_reMINI.log
  • 2500.02500.0_NANOmc106Xul16v2/step2_NANOmc106Xul16v2.log
Expand to see more relval errors ...

RelVals-NANO

  • 2500.32500.3_NANOmc130X/step2_NANOmc130X.log
  • 2500.3012500.301_EGMNANOmc130X/step2_EGMNANOmc130X.log
  • 2500.12500.1_NANOmc122Xrun3/step2_NANOmc122Xrun3.log
Expand to see more relval errors ...

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c1ad35/40049/summary.html
COMMIT: ebf90c2
CMSSW: CMSSW_14_1_X_2024-06-24-1100/el8_amd64_gcc12
Additional Tests: NANO
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45207/40049/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c1ad35/40049/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c1ad35/40049/git-merge-result

Comparison Summary

Summary:

  • You potentially added 14 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3345088
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3345062
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 202 log files, 165 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

NANO Comparison Summary

Summary:

  • You potentially added 41 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 15
  • DQMHistoTests: Total histograms compared: 17023
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 17023
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 14 files compared)
  • Checked 57 log files, 34 edm output root files, 15 DQM output files

Nano size comparison Summary:

Sample kb/ev ref kb/ev diff kb/ev ev/s/thd ref ev/s/thd diff rate mem/thd ref mem/thd
2500.0 2.855 2.783 0.072 ( +2.6% ) 0.93 3.31 -72.0% 2.211 2.266
2500.001 2.965 2.897 0.068 ( +2.4% ) 0.91 2.95 -69.2% 2.231 2.644
2500.002 2.912 2.843 0.069 ( +2.4% ) 0.93 3.07 -69.9% 2.262 2.679
2500.01 1.467 1.446 0.020 ( +1.4% ) 3.33 5.67 -41.2% 2.392 2.278
2500.011 1.930 1.906 0.024 ( +1.2% ) 2.04 3.12 -34.7% 2.626 2.401
2500.012 1.787 1.761 0.026 ( +1.5% ) 2.50 4.49 -44.3% 2.408 2.462
2500.1 2.419 2.353 0.065 ( +2.8% ) 1.05 4.22 -75.2% 2.139 2.080
2500.2 2.531 2.459 0.073 ( +3.0% ) 0.97 4.80 -79.8% 2.026 1.996
2500.21 1.306 1.285 0.021 ( +1.6% ) 2.09 3.08 -32.3% 2.325 2.279
2500.211 1.693 1.668 0.025 ( +1.5% ) 1.96 2.92 -32.9% 2.300 2.359
2500.3 2.300 2.229 0.071 ( +3.2% ) 1.10 8.85 -87.6% 2.039 1.975
2500.301 2.913 2.833 0.080 ( +2.8% ) 0.98 8.00 -87.7% 2.047 1.961
2500.31 7.164 7.164 0.000 ( +0.0% ) 1.51 1.45 +4.0% 1.707 1.707
2500.311 1.568 1.568 0.000 ( +0.0% ) 8.56 7.82 +9.4% 1.058 1.061
2500.312 540.338 540.338 0.000 ( +0.0% ) 0.51 0.54 -4.8% 1.661 1.657
2500.313 816.269 816.269 0.000 ( +0.0% ) 0.73 0.76 -4.1% 1.651 1.649
2500.32 1.369 1.348 0.022 ( +1.6% ) 4.07 10.43 -61.0% 2.241 2.254
2500.321 1.781 1.757 0.024 ( +1.4% ) 3.89 9.04 -57.0% 2.435 2.421
2500.322 1.260 1.236 0.024 ( +2.0% ) 3.82 8.90 -57.1% 2.148 2.248
2500.323 7.642 7.642 0.000 ( +0.0% ) 3.64 3.83 -5.0% 1.812 1.951
2500.324 1.898 1.874 0.024 ( +1.3% ) 3.81 8.87 -57.1% 2.167 2.181
2500.325 4.160 4.136 0.024 ( +0.6% ) 2.43 4.08 -40.4% 1.921 1.752
2500.326 3.367 3.342 0.024 ( +0.7% ) 1.15 1.48 -22.0% 2.053 1.773
2500.327 1.835 1.811 0.024 ( +1.3% ) 3.76 9.01 -58.2% 1.772 2.135
2500.328 3.389 3.364 0.024 ( +0.7% ) 1.04 1.36 -23.0% 1.838 1.835
2500.4 2.447 2.374 0.073 ( +3.1% ) 1.10 8.25 -86.7% 1.876 1.748
2500.401 1.969 1.891 0.078 ( +4.1% ) 0.96 7.88 -87.9% 1.894 1.775
2500.402 3.028 2.950 0.078 ( +2.6% ) 0.93 7.64 -87.8% 1.778 1.292
2500.403 8.778 8.700 0.078 ( +0.9% ) 1.18 2.83 -58.2% 1.984 1.792
2500.404 5.552 5.474 0.078 ( +1.4% ) 0.68 1.09 -38.2% 1.804 1.830
2500.405 2.938 2.860 0.078 ( +2.7% ) 0.96 7.64 -87.4% 1.841 1.906
2500.406 5.569 5.491 0.078 ( +1.4% ) 0.67 1.01 -34.3% 1.872 1.971
2500.5 5.194 5.194 0.000 ( +0.0% ) 14.14 16.22 -12.8% 1.531 1.469
2500.51 9.120 9.120 0.000 ( +0.0% ) 8.88 9.75 -9.0% 1.478 1.444


pairedJetTableTask = cms.Task(pairedJetTable)

pairedJetTableMC = cms.EDProducer("PAIReDONNXJetTagsProducer")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there is no specific parameter difference between data and MC, you should keep the same module pairedJetTable

@vlimant
Copy link
Contributor

vlimant commented Jul 15, 2024

please hold
performance issue to figure out first

@vlimant
Copy link
Contributor

vlimant commented Jul 22, 2024

-1

@cmsbuild
Copy link
Contributor

Milestone for this pull request has been moved to CMSSW_14_2_X. Please open a backport if it should also go in to CMSSW_14_1_X.

@antoniovilela
Copy link
Contributor

ping (to make bot change milestone)

@cmsbuild
Copy link
Contributor

Milestone for this pull request has been moved to CMSSW_15_0_X. Please open a backport if it should also go in to CMSSW_14_2_X.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants