Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeepTau v2p5 in nanoAOD [12_4_X] #38751

Conversation

mbluj
Copy link
Contributor

@mbluj mbluj commented Jul 15, 2022

PR description:

This PR adds working point definitions for a newly integrated DeepTau v2p5.
A corresponding study with details about WP threshold derivation and tau efficiency/mis-ID rate plots for both Run2 UL and Run 3 samples can be found here.

This PR is a backport of #38726 to 12_4_X.

In particular, the changes in this PR are (as in #38726):

  • For nanoAOD: add a dedicated set of variables _deepTauVars2018v2p5 to taus_cff.py,
  • Unify WP masking interface to a single function _tauIdWPMask();
  • Option to compute WP flags from raw scores (from_raw argument in _tauIdWPMask()) given the threshold values, instead of reading them directly from MINIAOD;
  • Change the format of storing WPs from bitmask to integer values for user-friendliness [see note in DeepTau v2p5 in nanoAOD #38726].

Differences wrt #38726:

  • WP thresholds values for DeepTau v2p5 are not used within runTauIdMVA.py and thus not added to miniAOD preserving its content (to fulfill no-changing policy for production releases);
  • deepTau v2p5 WP flags stored in nanoAOD are computed from raw scores.

PR validation:

Original PR successfully tested with the "limited" set of matrix tests and a custom nanoAOD production. Matrix tests of this PR ongoing - we do not expect failures and will update this description when tests are finished.

If this PR is a backport please specify the original PR and why you need to backport that PR.

This PR is a backport of #38726 to 12_4_X, introduces deepTauID v2p5 to nanoAOD v10.

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @mbluj for CMSSW_12_4_X.

It involves the following packages:

  • PhysicsTools/NanoAOD (xpog)
  • RecoTauTag/RecoTau (reconstruction)

@gouskos, @clacaputo, @cmsbuild, @fgolf, @jpata, @mariadalfonso can you please review it and eventually sign? Thanks.
@mbluj, @gpetruc, @azotz, @swertz this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@clacaputo
Copy link
Contributor

@cmsbuild please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-220b83/26270/summary.html
COMMIT: 0c2184c
CMSSW: CMSSW_12_4_X_2022-07-15-1100/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/38751/26270/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 15 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3676111
  • DQMHistoTests: Total failures: 218
  • DQMHistoTests: Total nulls: 75
  • DQMHistoTests: Total successes: 3675796
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -30.846000000000004 KiB( 49 files compared)
  • DQMHistoSizes: changed ( 11634.0,... ): -3.398 KiB Physics/NanoAODDQM
  • DQMHistoSizes: changed ( 1325.81 ): -7.797 KiB Physics/NanoAODDQM
  • DQMHistoSizes: changed ( 136.8523 ): -2.661 KiB Physics/NanoAODDQM
  • Checked 208 log files, 45 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@mariadalfonso
Copy link
Contributor

mariadalfonso commented Jul 18, 2022

RUN3:

V2.5 are correctly added
Screen Shot 2022-07-18 at 15 00 58

but the V2.1 get substantial changes
Screen Shot 2022-07-18 at 14 59 52

RUN2:

We should append the v2.5 also to the Run2 mini with the modifier run2_nanoAOD_106Xv2
Screen Shot 2022-07-18 at 15 04 00

see also here : https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_12_4_X_2022-07-15-1100+220b83/51643/validateJR/all_OldVSNew_TTbar13nanoEDM106Xv1in2017wf1325p81/

from_raw=True, wp_thrs=WORKING_POINTS_v2p5["mu"]),
idDeepTau2018v2p5VSjet = _tauIdWPMask("byDeepTau2018v2p5VSjetraw",
choices=("VVVLoose","VVLoose","VLoose","Loose","Medium","Tight","VTight","VVTight"),
doc="byDeepTau2018v2p5VSjet ID working points (deepTau2018v2p5)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"VVTight": 0.9733927,
},
}
workingPoints_ = WORKING_POINTS_v2p1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mariadalfonso
Copy link
Contributor

boosted taus:
current ID variables changes as follow
Screen Shot 2022-07-18 at 15 10 10
Is this expected ?

@kandrosov
Copy link
Contributor

@mariadalfonso Yes, it is expected because the way IDs are stored is changed from bitmask to WP numbering, as described in the PR description.
However, to avoid confusions, I propose that we move further discussion to the original PR thread #38726 and once it is merged, will address this backport. Would you agree?

@mariadalfonso
Copy link
Contributor

mariadalfonso commented Jul 18, 2022

@mariadalfonso Yes, it is expected because the way IDs are stored is changed from bitmask to WP numbering, as described in the PR description.
However, to avoid confusions, I propose that we move further discussion to the original PR thread #38726

You mean both the tau and boosted tau v2.1 quantities are expected to change ?
Then it's not anymore the same ID as in the past, so we cannot call anymore v2.1 (for tau) and 2017v2 (for boosted tau).

and once it is merged, will address this backport. Would you agree?

technically are doing different things, one re-run the ID and one read from MINI,
These comments apply to both (at the moment I do not have the Run3 plots with the master branch reading the mini from 12-4).

@kandrosov
Copy link
Contributor

kandrosov commented Jul 18, 2022

v2p1 (and other old ids) and their WPs are the same, but the way WPs are stored has changed.

Just for illustration, lets suppose we have 3 WPs for some tau ID discriminator: Loose, Medium and Tight. In the old code the value of id branch would be:
0: doesn't pass any WP, 1: pass Loose, 3: pass Medium, 7: pass Tight
While in the new code:
0: doesn't pass any WP, 1: pass Loose, 2: pass Medium, 3: pass Tight
We find the old notation (with bit masks) are not so user friendly and would like to change it. To be consistent, we would like to change it for all tau-related ids stored in nano.

@mariadalfonso
Copy link
Contributor

v2p1 (and other old ids) and their WPs are the same, but the way WPs are stored has changed.

yes I can see, but means that for run3: if someone do analysis from mini or nano we get two different thing
"3" can be medium or tight and this can only create confusion.

Can we push the change in bit-map for the Run3 mini as well in a separate PRs ?
or post-pone this change for the very futures ?

@kandrosov
Copy link
Contributor

@mariadalfonso sorry I don't understand your point about Run 3 mini vs. nano confusion... In miniAOD, WP results are accessible as tau.tauID("byMedium....") that returs 0. or 1., with a separate string for each WP, i.e. user doesn't work with "wp bit-mask" or "wp number".

@mariadalfonso
Copy link
Contributor

@mariadalfonso sorry I don't understand your point about Run 3 mini vs. nano confusion... In miniAOD, WP results are accessible as tau.tauID("byMedium....") that returs 0. or 1., with a separate string for each WP, i.e. user doesn't work with "wp bit-mask" or "wp number".

ok, good to know that there is a 3rd way to get the ID.
can we disantangle the two features in master: 1) one PR for bitmap of all the IDs and 2) add the new V2.5 ?
Would be much easier to review

@kandrosov
Copy link
Contributor

ok, good to know that there is a 3rd way to get the ID.

tau.tauID("ID_NAME"), e.g. tau.tauID("byMediumDeepTau2017v2p1VSjet"), is the only way to access tau IDs and their WPs at the MiniAOD level that is recommended and supported by the Tau POG. So, I don't expect any mini vs nano problems that could be triggered by the the proposed change.

can we disantangle the two features in master: 1) one PR for bitmap of all the IDs and 2) add the new V2.5 ?
Would be much easier to review

yes, it can be done. I'll prepare a PR with bitmap->numbering modification shortly.

@mbluj
Copy link
Contributor Author

mbluj commented Jul 19, 2022

@mariadalfonso, @kandrosov, my two cents on differences in tauIDs other than newly added deepTau v2p5: It is indeed true that the main change is caused by a modified way of storing of WPs (bitset->numbers), but one should also expect some change caused by the different selection of taus stored in nanoAOD. The selection bases on the "big OR" of loosest WPs of all discriminants against jet->tau fakes and now it contains also deepTau v2p5 VVVL: https://github.com/cms-sw/cmssw/pull/38751/files#diff-d54676262d2e5326ee3455e57747fd476b12be16993a8eb0f4794e8771f6526fR18

@cmsbuild
Copy link
Contributor

Pull request #38751 was updated. @gouskos, @clacaputo, @cmsbuild, @fgolf, @jpata, @mariadalfonso can you please check and sign again.

@mbluj
Copy link
Contributor Author

mbluj commented Jul 25, 2022

@mariadalfonso, ece4723 adds a customization function to add items missing in Run-2 UL samples needed by nano v10 (now only deepTau v2p5); it is a counterpart of dd40361 described in #38726 (comment).

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 1, 2022

-1

Failed Tests: RelVals-INPUT
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-220b83/26573/summary.html
COMMIT: 969573d
CMSSW: CMSSW_12_4_X_2022-08-01-1100/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/38751/26573/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals-INPUT

  • 136.72412136.72412_RunJetHT2016B_reminiaodUL+RunJetHT2016B_reminiaodUL+REMININANO_data2016UL_HIPM+HARVESTDR2_REMININANO_data2016UL_HIPM/step2_RunJetHT2016B_reminiaodUL+RunJetHT2016B_reminiaodUL+REMININANO_data2016UL_HIPM+HARVESTDR2_REMININANO_data2016UL_HIPM.log

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 52 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3676198
  • DQMHistoTests: Total failures: 230
  • DQMHistoTests: Total nulls: 72
  • DQMHistoTests: Total successes: 3675874
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -30.842000000000002 KiB( 49 files compared)
  • DQMHistoSizes: changed ( 11634.0,... ): -3.398 KiB Physics/NanoAODDQM
  • DQMHistoSizes: changed ( 1325.81 ): -7.797 KiB Physics/NanoAODDQM
  • DQMHistoSizes: changed ( 136.8523 ): -2.661 KiB Physics/NanoAODDQM
  • DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
  • Checked 208 log files, 45 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@mariadalfonso
Copy link
Contributor

mariadalfonso commented Aug 2, 2022

Seems that the mini+nano in one step fails 136.72412

Our test with persistent MINI are ok
https://gitlab.cern.ch/cms-nanoAOD/nanoAOD-integration/-/issues/173

@mbluj can you have a look ?
Thanks

@mbluj
Copy link
Contributor Author

mbluj commented Aug 2, 2022

Seems that the mini+nano in one step fails 136.72412

Our test with persistent MINI are ok https://gitlab.cern.ch/cms-nanoAOD/nanoAOD-integration/-/issues/173

@mbluj can you have a look ? Thanks

Yes, I am on it. The problem is indeed due to the common (re)mini+nano workflow as at both steps deepTauID is added to taus. At mini it is default with 125X/124X, while at nano it is triggered by an era modifier. There are two problems with this setup: one is technical as modules with same names are crated, but different inputs are expected (modules are in different place in "production chain"), while other is "philosophical" as deepTauID is tried to be run twice. The first one can be solved with some effort by adding a suffix to module names, while the other is more difficult to avoid without an prior knowledge on combination of workflows, i.e. production levels (here mini & nano) and eras used.

P.S. The same issue affects PR to master.

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 2, 2022

Pull request #38751 was updated. @gouskos, @swertz, @vlimant, @clacaputo, @cmsbuild, @jpata, @mariadalfonso can you please check and sign again.

@mariadalfonso
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 2, 2022

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-220b83/26600/summary.html
COMMIT: 1885608
CMSSW: CMSSW_12_4_X_2022-08-02-1100/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/38751/26600/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 44 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3676198
  • DQMHistoTests: Total failures: 213
  • DQMHistoTests: Total nulls: 71
  • DQMHistoTests: Total successes: 3675892
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -30.846000000000004 KiB( 49 files compared)
  • DQMHistoSizes: changed ( 11634.0,... ): -3.398 KiB Physics/NanoAODDQM
  • DQMHistoSizes: changed ( 1325.81 ): -7.797 KiB Physics/NanoAODDQM
  • DQMHistoSizes: changed ( 136.8523 ): -2.661 KiB Physics/NanoAODDQM
  • Checked 208 log files, 45 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@mariadalfonso
Copy link
Contributor

mariadalfonso commented Aug 3, 2022

+xpog

@clacaputo
Copy link
Contributor

+reconstruction

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 4, 2022

This pull request is fully signed and it will be integrated in one of the next CMSSW_12_4_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_12_5_X is complete. This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@qliphy
Copy link
Contributor

qliphy commented Aug 6, 2022

+1

@cmsbuild cmsbuild merged commit f0d04e7 into cms-sw:CMSSW_12_4_X Aug 6, 2022
@mbluj mbluj deleted the CMSSW_12_4_X_tau-pog_deepTau-v2p5-WPs-nano branch October 10, 2023 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants