Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to refine FastSim DeepJet discriminators #40553

Conversation

wolfmor
Copy link
Contributor

@wolfmor wolfmor commented Jan 18, 2023

PR description:

Requires: cms-data/PhysicsTools-NanoAOD#14

This PR adds a function that uses a regression neural network to refine the DeepJet discriminators of CHS jets in NanoAOD for FastSim to better match FullSim. The function can be called by including the option --customise PhysicsTools/NanoAOD/jetsAK4_CHS_cff.nanoAOD_refineFastSim_bTagDeepFlav in the cmsDriver command and requires the ONNX model added in the above mentioned PR to cms-data. The original values are copied to new variables named with the suffix "unrefined".

Due to a bug in ONNX runtime 1.10.0 (see here) graph optimization has to be disabled to evaluate the model. The corresponding option is implemented in BaseMVAValueMapProducer for the ONNX backend.

The technique has been presented at the FastSim Days 2022 Workshop. There are plans to make this the default for FastSim in the future and possibly to extend to further collections/variables.

A complete set of commands to produce NanoAOD files with refined DeepJet discriminators is:

cmsDriver.py TTbar_13TeV_TuneCUETP8M1_cfi --relval 100000,1000 -s GEN,SIM,RECOBEFMIX,DIGI:pdigi_valid,L1,DIGI2RAW,L1Reco,RECO,VALIDATION:@standardValidation,DQM:@standardDQMFS -n 10 --conditions auto:run2_mc --beamspot Realistic25ns13TeV2016Collision --datatier GEN-SIM-DIGI-RECO,DQMIO --eventcontent FEVTDEBUGHLT,DQM --fast --era Run2_2016

cmsDriver.py step3 -s PAT --era Run2_2016 -n -1 --conditions auto:run2_mc --mc --datatier MINIAODSIM --eventcontent MINIAODSIM --filein file:TTbar_13TeV_TuneCUETP8M1_cfi_GEN_SIM_RECOBEFMIX_DIGI_L1_DIGI2RAW_L1Reco_RECO_VALIDATION_DQM.root --fast

cmsDriver.py --python_filename NanoAODrefined_cfg.py --eventcontent NANOAODSIM --fast --customise Configuration/DataProcessing/Utils.addMonitoring,PhysicsTools/NanoAOD/jetsAK4_CHS_cff.nanoAOD_refineFastSim_bTagDeepFlav --datatier NANOAODSIM --fileout file:step3_NANO.root --conditions auto:run2_mc --step NANO --filein "file:step3_PAT.root" --era run2_nanoAOD_106Xv2 --mc -n -1

PR validation:

The neural network has been trained on GEN-synchronized FastSim/FullSim jet pairs from SUSY simplified model T1tttt events and has been validated also in TTbar events. In both cases, considerably improved agreement with the FullSim output and an improvement in correlations among output observables and external parameters is seen.

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

Needs to be backported to 12_6.

@sbein @kpedro88

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40553/33785

  • This PR adds an extra 20KB to repository

  • There are other open Pull requests which might conflict with changes you have proposed:

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @wolfmor for master.

It involves the following packages:

  • PhysicsTools/NanoAOD (xpog)
  • PhysicsTools/PatAlgos (xpog, reconstruction)

@cmsbuild, @mandrenguyen, @clacaputo, @swertz, @vlimant can you please review it and eventually sign? Thanks.
@AlexDeMoor, @rappoccio, @gouskos, @jdolen, @JyothsnaKomaragiri, @ahinzmann, @AnnikaStein, @schoef, @emilbols, @jdamgov, @mbluj, @nhanvtran, @gkasieczka, @hatakeyamak, @gpetruc, @azotz, @mariadalfonso, @demuller, @andrzejnovak, @seemasharmafnal, @mmarionncern this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@mandrenguyen
Copy link
Contributor

type btv

@cmsbuild cmsbuild added the btv label Jan 18, 2023
@kpedro88
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-19f146/30061/summary.html
COMMIT: 4ca1787
CMSSW: CMSSW_13_0_X_2023-01-18-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/40553/30061/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3555479
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3555454
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 211 log files, 162 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@mandrenguyen
Copy link
Contributor

@wolfmor Would it be worth adding the driver commands you list in the PR description as a RelVal workflow? In particular, as you indicate that further developments are coming.

@kpedro88
Copy link
Contributor

@mandrenguyen we're working on exactly that

…rom-CMSSW_13_0_X_2023-01-17-1100

add test workflow
@kpedro88
Copy link
Contributor

@cms-sw/pdmv-l2 @cms-sw/upgrade-l2 please check and sign? workflow changes are hopefully straightforward, let me know if you have any concerns.

@srimanob
Copy link
Contributor

+Upgrade

@sunilUIET
Copy link
Contributor

+pdmv

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 1, 2023

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

perrotta commented Feb 1, 2023

+1

@srimanob
Copy link
Contributor

Hi @wolfmor @kpedro88 @sbein
SUS is planning for Run3-2022 FastSim production in 12_4. Should this PR and ONNX model for refinement be backport to 12_4? Or if there will be technical detail with backport? Thanks.

@sbein
Copy link
Contributor

sbein commented Apr 13, 2023

Hi @srimanob yes, ideally this should be backported so that the refinement can be used in Run 3. My plan was once the backport to UL is merged #40828 (comment), I will do the backport to 12_4. This will probably take another week.

@srimanob
Copy link
Contributor

Thanks @sbein
So I will note this in the production plan to @cms-sw/pdmv-l2

@srimanob
Copy link
Contributor

Hi @sbein @kpedro88
Will we face technical issue if backport to 12_6 first? This is not about Run-2, but Run-3 2022 NanoV11. Thx.

@swertz
Copy link
Contributor

swertz commented Apr 14, 2023

Since the refinement runs at Nano level, for Run3 samples why can't you simply use the imminent NanoV12 campaigns in 13_0 that will take 12_4 MINI samples as input?

FYI @simonepigazzini

@srimanob
Copy link
Contributor

srimanob commented Apr 14, 2023

Hi @swertz
What I am asking is based on the production done for Summer22 campaign. Currently, they run NanoV10 and 11. If SUS will mix Fast/Full samples, it is better to run the same version. So either run V12 in production, or backport to 12_6 for V11. I can put this comment on the discussion also. Thx.

https://cms-pdmv.cern.ch/mcm/campaigns?prepid=Run3*22*Nano*&page=-1&shown=16447

@swertz
Copy link
Contributor

swertz commented Apr 14, 2023

Hi @srimanob , so much is clear, I was also referring to Summer22 MC. Don't forget that current Summer22 and Nano v10/11 is not "complete" in the sense that many jet-related ingredients (PUPPI tune, taggers) were not updated yet. NanoV12 will be run on Summer22 MC and will contain all the recommended ingredients for analysis of Run3 data.

I can see why you'd want some FastSim MC in 12_6/NanoV11 to be able to quickly implement Fast/Full comparisons with the existing samples, but just keep in mind that for physics results, in most cases you'll need to use NanoV12 anyway.

Another point: this PR only implemented refinement for taggers in CHS jets (which makes sense for the Run2 UL backport), but Run3 samples contain PUPPI jets...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants