DeepTau v2p5 in nanoAOD [12_4_X] #38751

mbluj · 2022-07-15T13:10:22Z

PR description:

This PR adds working point definitions for a newly integrated DeepTau v2p5.
A corresponding study with details about WP threshold derivation and tau efficiency/mis-ID rate plots for both Run2 UL and Run 3 samples can be found here.

This PR is a backport of #38726 to 12_4_X.

In particular, the changes in this PR are (as in #38726):

For nanoAOD: add a dedicated set of variables _deepTauVars2018v2p5 to taus_cff.py,
Unify WP masking interface to a single function _tauIdWPMask();
Option to compute WP flags from raw scores (from_raw argument in _tauIdWPMask()) given the threshold values, instead of reading them directly from MINIAOD;
Change the format of storing WPs from bitmask to integer values for user-friendliness [see note in DeepTau v2p5 in nanoAOD #38726].

Differences wrt #38726:

WP thresholds values for DeepTau v2p5 are not used within runTauIdMVA.py and thus not added to miniAOD preserving its content (to fulfill no-changing policy for production releases);
deepTau v2p5 WP flags stored in nanoAOD are computed from raw scores.

PR validation:

Original PR successfully tested with the "limited" set of matrix tests and a custom nanoAOD production. Matrix tests of this PR ongoing - we do not expect failures and will update this description when tests are finished.

If this PR is a backport please specify the original PR and why you need to backport that PR.

This PR is a backport of #38726 to 12_4_X, introduces deepTauID v2p5 to nanoAOD v10.

cmsbuild · 2022-07-15T13:10:47Z

A new Pull Request was created by @mbluj for CMSSW_12_4_X.

It involves the following packages:

PhysicsTools/NanoAOD (xpog)
RecoTauTag/RecoTau (reconstruction)

@gouskos, @clacaputo, @cmsbuild, @fgolf, @jpata, @mariadalfonso can you please review it and eventually sign? Thanks.
@mbluj, @gpetruc, @azotz, @swertz this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy, @rappoccio you are the release manager for this.

cms-bot commands are listed here

clacaputo · 2022-07-15T16:07:21Z

@cmsbuild please test

cmsbuild · 2022-07-15T20:14:24Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-220b83/26270/summary.html
COMMIT: 0c2184c
CMSSW: CMSSW_12_4_X_2022-07-15-1100/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/38751/26270/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 15 differences found in the comparisons
DQMHistoTests: Total files compared: 50
DQMHistoTests: Total histograms compared: 3676111
DQMHistoTests: Total failures: 218
DQMHistoTests: Total nulls: 75
DQMHistoTests: Total successes: 3675796
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: -30.846000000000004 KiB( 49 files compared)
DQMHistoSizes: changed ( 11634.0,... ): -3.398 KiB Physics/NanoAODDQM
DQMHistoSizes: changed ( 1325.81 ): -7.797 KiB Physics/NanoAODDQM
DQMHistoSizes: changed ( 136.8523 ): -2.661 KiB Physics/NanoAODDQM
Checked 208 log files, 45 edm output root files, 50 DQM output files
TriggerResults: no differences found

mariadalfonso · 2022-07-18T13:04:41Z

RUN3:

V2.5 are correctly added

but the V2.1 get substantial changes

RUN2:

We should append the v2.5 also to the Run2 mini with the modifier run2_nanoAOD_106Xv2

see also here : https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_12_4_X_2022-07-15-1100+220b83/51643/validateJR/all_OldVSNew_TTbar13nanoEDM106Xv1in2017wf1325p81/

mariadalfonso · 2022-07-17T20:55:12Z

PhysicsTools/NanoAOD/python/taus_cff.py

+                                            from_raw=True, wp_thrs=WORKING_POINTS_v2p5["mu"]),
+    idDeepTau2018v2p5VSjet = _tauIdWPMask("byDeepTau2018v2p5VSjetraw", 
+                                            choices=("VVVLoose","VVLoose","VLoose","Loose","Medium","Tight","VTight","VVTight"), 
+                                            doc="byDeepTau2018v2p5VSjet ID working points (deepTau2018v2p5)", 


this is not 1:1 with 12-5
https://github.com/cms-sw/cmssw/pull/38726/files#diff-d54676262d2e5326ee3455e57747fd476b12be16993a8eb0f4794e8771f6526fR148-R156

mariadalfonso · 2022-07-17T21:04:16Z

RecoTauTag/RecoTau/python/tools/runTauIdMVA.py

-                    "VVTight": 0.9733927,
-                },
-            }
+            workingPoints_ = WORKING_POINTS_v2p1


similarly in master this is different

https://github.com/cms-sw/cmssw/pull/38751/files#diff-3c94bf99fbc35756eae7c627998054f6752b0dc17386bb8ebf5caf00c2c05218L673-L677

mariadalfonso · 2022-07-18T13:11:15Z

boosted taus:
current ID variables changes as follow

Is this expected ?

kandrosov · 2022-07-18T13:22:15Z

@mariadalfonso Yes, it is expected because the way IDs are stored is changed from bitmask to WP numbering, as described in the PR description.
However, to avoid confusions, I propose that we move further discussion to the original PR thread #38726 and once it is merged, will address this backport. Would you agree?

mariadalfonso · 2022-07-18T13:27:43Z

@mariadalfonso Yes, it is expected because the way IDs are stored is changed from bitmask to WP numbering, as described in the PR description.
However, to avoid confusions, I propose that we move further discussion to the original PR thread #38726

You mean both the tau and boosted tau v2.1 quantities are expected to change ?
Then it's not anymore the same ID as in the past, so we cannot call anymore v2.1 (for tau) and 2017v2 (for boosted tau).

and once it is merged, will address this backport. Would you agree?

technically are doing different things, one re-run the ID and one read from MINI,
These comments apply to both (at the moment I do not have the Run3 plots with the master branch reading the mini from 12-4).

kandrosov · 2022-07-18T13:38:30Z

v2p1 (and other old ids) and their WPs are the same, but the way WPs are stored has changed.

Just for illustration, lets suppose we have 3 WPs for some tau ID discriminator: Loose, Medium and Tight. In the old code the value of id branch would be:
0: doesn't pass any WP, 1: pass Loose, 3: pass Medium, 7: pass Tight
While in the new code:
0: doesn't pass any WP, 1: pass Loose, 2: pass Medium, 3: pass Tight
We find the old notation (with bit masks) are not so user friendly and would like to change it. To be consistent, we would like to change it for all tau-related ids stored in nano.

mariadalfonso · 2022-07-18T13:52:28Z

v2p1 (and other old ids) and their WPs are the same, but the way WPs are stored has changed.

yes I can see, but means that for run3: if someone do analysis from mini or nano we get two different thing
"3" can be medium or tight and this can only create confusion.

Can we push the change in bit-map for the Run3 mini as well in a separate PRs ?
or post-pone this change for the very futures ?

kandrosov · 2022-07-18T13:59:45Z

@mariadalfonso sorry I don't understand your point about Run 3 mini vs. nano confusion... In miniAOD, WP results are accessible as tau.tauID("byMedium....") that returs 0. or 1., with a separate string for each WP, i.e. user doesn't work with "wp bit-mask" or "wp number".

mariadalfonso · 2022-07-18T14:32:55Z

@mariadalfonso sorry I don't understand your point about Run 3 mini vs. nano confusion... In miniAOD, WP results are accessible as tau.tauID("byMedium....") that returs 0. or 1., with a separate string for each WP, i.e. user doesn't work with "wp bit-mask" or "wp number".

ok, good to know that there is a 3rd way to get the ID.
can we disantangle the two features in master: 1) one PR for bitmap of all the IDs and 2) add the new V2.5 ?
Would be much easier to review

kandrosov · 2022-07-18T14:49:24Z

ok, good to know that there is a 3rd way to get the ID.

tau.tauID("ID_NAME"), e.g. tau.tauID("byMediumDeepTau2017v2p1VSjet"), is the only way to access tau IDs and their WPs at the MiniAOD level that is recommended and supported by the Tau POG. So, I don't expect any mini vs nano problems that could be triggered by the the proposed change.

can we disantangle the two features in master: 1) one PR for bitmap of all the IDs and 2) add the new V2.5 ?
Would be much easier to review

yes, it can be done. I'll prepare a PR with bitmap->numbering modification shortly.

mbluj · 2022-07-19T09:28:59Z

@mariadalfonso, @kandrosov, my two cents on differences in tauIDs other than newly added deepTau v2p5: It is indeed true that the main change is caused by a modified way of storing of WPs (bitset->numbers), but one should also expect some change caused by the different selection of taus stored in nanoAOD. The selection bases on the "big OR" of loosest WPs of all discriminants against jet->tau fakes and now it contains also deepTau v2p5 VVVL: https://github.com/cms-sw/cmssw/pull/38751/files#diff-d54676262d2e5326ee3455e57747fd476b12be16993a8eb0f4794e8771f6526fR18

cmsbuild · 2022-07-25T12:01:37Z

Pull request #38751 was updated. @gouskos, @clacaputo, @cmsbuild, @fgolf, @jpata, @mariadalfonso can you please check and sign again.

mbluj · 2022-07-25T12:05:32Z

@mariadalfonso, ece4723 adds a customization function to add items missing in Run-2 UL samples needed by nano v10 (now only deepTau v2p5); it is a counterpart of dd40361 described in #38726 (comment).

cmsbuild · 2022-08-01T19:54:34Z

-1

Failed Tests: RelVals-INPUT
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-220b83/26573/summary.html
COMMIT: 969573d
CMSSW: CMSSW_12_4_X_2022-08-01-1100/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/38751/26573/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals-INPUT

136.72412136.72412_RunJetHT2016B_reminiaodUL+RunJetHT2016B_reminiaodUL+REMININANO_data2016UL_HIPM+HARVESTDR2_REMININANO_data2016UL_HIPM/step2_RunJetHT2016B_reminiaodUL+RunJetHT2016B_reminiaodUL+REMININANO_data2016UL_HIPM+HARVESTDR2_REMININANO_data2016UL_HIPM.log

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 52 differences found in the comparisons
DQMHistoTests: Total files compared: 50
DQMHistoTests: Total histograms compared: 3676198
DQMHistoTests: Total failures: 230
DQMHistoTests: Total nulls: 72
DQMHistoTests: Total successes: 3675874
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: -30.842000000000002 KiB( 49 files compared)
DQMHistoSizes: changed ( 11634.0,... ): -3.398 KiB Physics/NanoAODDQM
DQMHistoSizes: changed ( 1325.81 ): -7.797 KiB Physics/NanoAODDQM
DQMHistoSizes: changed ( 136.8523 ): -2.661 KiB Physics/NanoAODDQM
DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
Checked 208 log files, 45 edm output root files, 50 DQM output files
TriggerResults: no differences found

mariadalfonso · 2022-08-02T08:51:23Z

Seems that the mini+nano in one step fails 136.72412

Our test with persistent MINI are ok
https://gitlab.cern.ch/cms-nanoAOD/nanoAOD-integration/-/issues/173

@mbluj can you have a look ?
Thanks

mbluj · 2022-08-02T09:06:27Z

Seems that the mini+nano in one step fails 136.72412

Our test with persistent MINI are ok https://gitlab.cern.ch/cms-nanoAOD/nanoAOD-integration/-/issues/173

@mbluj can you have a look ? Thanks

Yes, I am on it. The problem is indeed due to the common (re)mini+nano workflow as at both steps deepTauID is added to taus. At mini it is default with 125X/124X, while at nano it is triggered by an era modifier. There are two problems with this setup: one is technical as modules with same names are crated, but different inputs are expected (modules are in different place in "production chain"), while other is "philosophical" as deepTauID is tried to be run twice. The first one can be solved with some effort by adding a suffix to module names, while the other is more difficult to avoid without an prior knowledge on combination of workflows, i.e. production levels (here mini & nano) and eras used.

P.S. The same issue affects PR to master.

…e names and different inputs

cmsbuild · 2022-08-02T10:17:49Z

Pull request #38751 was updated. @gouskos, @swertz, @vlimant, @clacaputo, @cmsbuild, @jpata, @mariadalfonso can you please check and sign again.

mariadalfonso · 2022-08-02T13:07:58Z

please test

cmsbuild · 2022-08-02T17:19:56Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-220b83/26600/summary.html
COMMIT: 1885608
CMSSW: CMSSW_12_4_X_2022-08-02-1100/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/38751/26600/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 44 differences found in the comparisons
DQMHistoTests: Total files compared: 50
DQMHistoTests: Total histograms compared: 3676198
DQMHistoTests: Total failures: 213
DQMHistoTests: Total nulls: 71
DQMHistoTests: Total successes: 3675892
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: -30.846000000000004 KiB( 49 files compared)
DQMHistoSizes: changed ( 11634.0,... ): -3.398 KiB Physics/NanoAODDQM
DQMHistoSizes: changed ( 1325.81 ): -7.797 KiB Physics/NanoAODDQM
DQMHistoSizes: changed ( 136.8523 ): -2.661 KiB Physics/NanoAODDQM
Checked 208 log files, 45 edm output root files, 50 DQM output files
TriggerResults: no differences found

mariadalfonso · 2022-08-03T14:46:02Z

+xpog

the only difference with master is that also for Run3 we re-run all the times the new ID due to the no change policy of the mini already in production
more plots here https://gitlab.cern.ch/cms-nanoAOD/nanoAOD-integration/-/issues/173

clacaputo · 2022-08-04T14:11:33Z

+reconstruction

cmsbuild · 2022-08-04T14:11:58Z

This pull request is fully signed and it will be integrated in one of the next CMSSW_12_4_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_12_5_X is complete. This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

qliphy · 2022-08-06T04:46:48Z

+1

cmsbuild added this to the CMSSW_12_4_X milestone Jul 15, 2022

cmsbuild added orp-pending pending-signatures reconstruction-pending tests-pending xpog-pending labels Jul 15, 2022

cmsbuild added tests-started and removed tests-pending labels Jul 15, 2022

cmsbuild added tests-approved and removed tests-started labels Jul 15, 2022

mariadalfonso reviewed Jul 18, 2022

View reviewed changes

kandrosov mentioned this pull request Jul 18, 2022

Updated way to store DeepTau WPs in nanoAOD #38776

Closed

mariadalfonso mentioned this pull request Jul 25, 2022

new features for V10 nanoAOD cms-nanoAOD/cmssw#580

Closed

39 tasks

cmsbuild removed the tests-approved label Jul 25, 2022

cmsbuild added the tests-pending label Jul 25, 2022

cmsbuild added the tests-started label Aug 1, 2022

cmsbuild added tests-rejected and removed tests-started labels Aug 1, 2022

Add suffixes to protect against recreaction of tauId modules with sam…

1885608

…e names and different inputs

cmsbuild added tests-pending and removed tests-rejected labels Aug 2, 2022

cmsbuild added tests-started and removed tests-pending labels Aug 2, 2022

cmsbuild added tests-approved and removed tests-started labels Aug 2, 2022

cmsbuild added xpog-approved and removed xpog-pending labels Aug 3, 2022

cmsbuild added fully-signed reconstruction-approved and removed reconstruction-pending pending-signatures labels Aug 4, 2022

cmsbuild added orp-approved and removed orp-pending labels Aug 6, 2022

cmsbuild merged commit f0d04e7 into cms-sw:CMSSW_12_4_X Aug 6, 2022

mbluj deleted the CMSSW_12_4_X_tau-pog_deepTau-v2p5-WPs-nano branch October 10, 2023 10:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepTau v2p5 in nanoAOD [12_4_X] #38751

DeepTau v2p5 in nanoAOD [12_4_X] #38751

mbluj commented Jul 15, 2022

cmsbuild commented Jul 15, 2022

clacaputo commented Jul 15, 2022

cmsbuild commented Jul 15, 2022

mariadalfonso commented Jul 18, 2022 •

edited

Loading

mariadalfonso Jul 17, 2022

mariadalfonso Jul 17, 2022

mariadalfonso commented Jul 18, 2022

kandrosov commented Jul 18, 2022

mariadalfonso commented Jul 18, 2022 •

edited

Loading

kandrosov commented Jul 18, 2022 •

edited

Loading

mariadalfonso commented Jul 18, 2022

kandrosov commented Jul 18, 2022

mariadalfonso commented Jul 18, 2022

kandrosov commented Jul 18, 2022

mbluj commented Jul 19, 2022

cmsbuild commented Jul 25, 2022

mbluj commented Jul 25, 2022

cmsbuild commented Aug 1, 2022

mariadalfonso commented Aug 2, 2022 •

edited

Loading

mbluj commented Aug 2, 2022 •

edited

Loading

cmsbuild commented Aug 2, 2022

mariadalfonso commented Aug 2, 2022

cmsbuild commented Aug 2, 2022

mariadalfonso commented Aug 3, 2022 •

edited

Loading

clacaputo commented Aug 4, 2022

cmsbuild commented Aug 4, 2022

qliphy commented Aug 6, 2022

DeepTau v2p5 in nanoAOD [12_4_X] #38751

DeepTau v2p5 in nanoAOD [12_4_X] #38751

Conversation

mbluj commented Jul 15, 2022

PR description:

PR validation:

If this PR is a backport please specify the original PR and why you need to backport that PR.

cmsbuild commented Jul 15, 2022

clacaputo commented Jul 15, 2022

cmsbuild commented Jul 15, 2022

Comparison Summary

mariadalfonso commented Jul 18, 2022 • edited Loading

mariadalfonso Jul 17, 2022

Choose a reason for hiding this comment

mariadalfonso Jul 17, 2022

Choose a reason for hiding this comment

mariadalfonso commented Jul 18, 2022

kandrosov commented Jul 18, 2022

mariadalfonso commented Jul 18, 2022 • edited Loading

kandrosov commented Jul 18, 2022 • edited Loading

mariadalfonso commented Jul 18, 2022

kandrosov commented Jul 18, 2022

mariadalfonso commented Jul 18, 2022

kandrosov commented Jul 18, 2022

mbluj commented Jul 19, 2022

cmsbuild commented Jul 25, 2022

mbluj commented Jul 25, 2022

cmsbuild commented Aug 1, 2022

RelVals-INPUT

Comparison Summary

mariadalfonso commented Aug 2, 2022 • edited Loading

mbluj commented Aug 2, 2022 • edited Loading

cmsbuild commented Aug 2, 2022

mariadalfonso commented Aug 2, 2022

cmsbuild commented Aug 2, 2022

Comparison Summary

mariadalfonso commented Aug 3, 2022 • edited Loading

clacaputo commented Aug 4, 2022

cmsbuild commented Aug 4, 2022

qliphy commented Aug 6, 2022

mariadalfonso commented Jul 18, 2022 •

edited

Loading

mariadalfonso commented Jul 18, 2022 •

edited

Loading

kandrosov commented Jul 18, 2022 •

edited

Loading

mariadalfonso commented Aug 2, 2022 •

edited

Loading

mbluj commented Aug 2, 2022 •

edited

Loading

mariadalfonso commented Aug 3, 2022 •

edited

Loading