Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New MVA-based Tau-Ids (update) #146

Merged
merged 14 commits into from
Apr 4, 2018

Conversation

mbluj
Copy link

@mbluj mbluj commented Mar 27, 2018

The MVA-based Tau-IDs are retrained with 94X Fall17 samples and should be evaluated using MiniAOD Taus (skimmedTaus) and added to NanoAOD. This PR is meant to supersede its initial version (#108) based on MVA training with 92X Summer17 samples.

The commits change three files:

  • RecoTauTag/Configuration/python/loadRecoTauTagMVAsFromPrepDB_cfi.py with definition of MVA payloads (new ones are added),
  • PhysicsTools/NanoAOD/python/taus_cff.py where new Tau-Ids are evaluated and added to the Tau table
  • PhysicsTools/NanoAOD/python/nanoDQM_cfi.py

Notes:

  • This PR bases on CMSSW_9_4_5_cand1 and master of NanoAOD repository.
  • Implementation in the PR is expected to give same results on MiniAOD v1 and v2 thanks to employing run2_miniAOD_94XFall17 era for v2. However, it crashes with v1 inputs due to some electron dependences
  • It is considered to add two int numbers to code WPs of two cut-based tau-Ids as requested by end users.

FYI, @veelken @steggema @ohlushch

@veelken
Copy link

veelken commented Mar 27, 2018

Dear all,

I see that the two int numbers for the cut-based tau ID have not been added to PhysicsTools/NanoAOD/python/taus_cff.py . Can these please still be added ? I think the option to switch to the cut-based tau ID discriminators may be a useful option in case of problems with data/MC differences in the MVA tau ID input variables (the Tau POG is seeing some evidence for such data/MC differences right now and is investigating).

@steggema
Copy link

Hi Michal,
Thanks for the fast turnaround! We agreed yesterday to support both the 2017 MC v1-based and 2017 MC v2-based trainings. Do I read correctly that this PR adds the v2-based training, plus reverts to the 2015-based training for NanoAOD production based on MiniAOD v2? I think it would be better to offer the 2017 MC v1-based training, either in addition or replacing the 2015 training.

Copy link

@gpetruc gpetruc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • use run2_nanoAOD_94XMiniAODv1/v2 eras for 94X
  • it's not clear what's happening on 80X

cut = cms.string("pt > 18 && tauID('decayModeFindingNewDMs') && (tauID('byLooseCombinedIsolationDeltaBetaCorr3Hits') || tauID('byVLooseIsolationMVArun2v1DBoldDMwLT') || tauID('byVLooseIsolationMVArun2v1DBnewDMwLT') || tauID('byVLooseIsolationMVArun2v1DBdR03oldDMwLT') || tauID('byVVLooseIsolationMVArun2v1DBoldDMwLT2017v2') || tauID('byVVLooseIsolationMVArun2v1DBnewDMwLT2017v2') || tauID('byVVLooseIsolationMVArun2v1DBdR03oldDMwLT2017v2'))")
)
run2_miniAOD_94XFall17.toModify(finalTaus,
cut = cms.string("pt > 18 && tauID('decayModeFindingNewDMs') && (tauID('byLooseCombinedIsolationDeltaBetaCorr3Hits') || tauID('byVLooseIsolationMVArun2v1DBoldDMwLT2015') || tauID('byVLooseIsolationMVArun2v1DBnewDMwLT') || tauID('byVLooseIsolationMVArun2v1DBdR03oldDMwLT') || tauID('byVVLooseIsolationMVArun2v1DBoldDMwLT2017v2') || tauID('byVVLooseIsolationMVArun2v1DBnewDMwLT2017v2') || tauID('byVVLooseIsolationMVArun2v1DBdR03oldDMwLT2017v2'))")
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use run2_miniAOD_94XFall17 era
Have the default config to be that for 94XMiniAODv2, and use run2_nanoAOD_94XMiniAODv1 to adapt to MiniAOD v1
(I'm putting just one comment but it is true for all uses of this era in the file)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks for letting me know, I will correct it.

@@ -41,6 +41,8 @@
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff too is not being too smart here.
I would assume this only loads new items compared to what's currently in the release, can you confirm?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I confirm that it only adds new 2017v2 payloads. In particular, it means that standard RECO and MiniAOD v1 and v2 workflows are not broken.

tauIDSources = _tauIDSourcesExt
)

patTauMVAIDsSeq += slimmedTausUpdated
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put all this in a separate cff file (e.g. taus_updatedMVAIds_cff) ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

+ patTauDiscriminationByTightIsolationMVArun2v1DBoldDMwLT2015
+ patTauDiscriminationByVTightIsolationMVArun2v1DBoldDMwLT2015
+ patTauDiscriminationByVVTightIsolationMVArun2v1DBoldDMwLT2015
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 2015 what is in current miniAOD v1?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly

@mbluj
Copy link
Author

mbluj commented Mar 27, 2018 via email

@mbluj
Copy link
Author

mbluj commented Mar 27, 2018 via email

@mbluj
Copy link
Author

mbluj commented Mar 28, 2018

Dear All,
the two last commits address comments on:

  • moving definitions of MVA tau-Ids to a separate cff file
  • usage of correct eras (hopefully it is fine now)
  • adding MVAIso 2017v1 which is supported by Tau POG.

It does not address, however, the request to add cut-based tau-Ids as due to some reasons there is a problem with parsing of mathematical expressions like: "tauID('byCombinedIsolationDeltaBetaCorrRaw3Hits')<1.5" ("<" sign looks be a problem).
Advises how to cope with this problem are welcome.

@gpetruc
Copy link

gpetruc commented Mar 28, 2018

Hi @mbluj,
Where do you need to parse the x < 1.5 ?
If it's in the table producer, then comparisons should be supported for variables of type bool, as the physicstools parser grammar allows them in cuts but not in expressions.
If you need it in an expression, e.g. to say 1*(x<1.5) + 2*(y>4.7) then you need to use the ternary operator but with one extra question mark at the beginning (because of limitations in the parser of the grammar), so the above would become (? x<1.5 ? 1 : 0) + (? y>4.7 ? 2 : 0)

@mbluj
Copy link
Author

mbluj commented Mar 28, 2018 via email

@gpetruc-bot
Copy link

Copy link

@gpetruc-bot gpetruc-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automatic test report for 343947

Code integration

Code checks not run for this PR (no source files modified)

Tests

  • Test mc_94Xv2: passed
  • Test mc_94X: passed

@arizzi
Copy link

arizzi commented Mar 28, 2018

please use the proper Era Switches to support MC campaigns
94X MiniAOD v1
94X MiniAOD v2
80X MiniAOD
and 92X , 80X for data

@mbluj
Copy link
Author

mbluj commented Mar 28, 2018

It looks that 80X jobs fail. How they should be setup? Same as MiniAODv1 ones? Should I use a specific era?
OK, it looks that it was answered in comment by @arizzi
It looks I need a clarification:

  • for Nano at MiniAODv1/v2 with 94X I am using run2_nanoAOD_94XMiniAODv1/v1
  • for Nano at MiniAOD with 92X I am going to use run2_nanoAOD_92X, correct?
  • while for 80X I do not see anything which looks sensible, should I use run2_miniAOD_80XLegacy? Or run2_nanoAOD_92X?

@arizzi
Copy link

arizzi commented Mar 28, 2018

@mbluj
Copy link
Author

mbluj commented Mar 28, 2018 via email

@mbluj
Copy link
Author

mbluj commented Mar 28, 2018

Now eras should correctly handle 94X MiniAOD v1 and v1, 92X MiniAOD, and 80X MiniAOD and LegacyMiniAOD.

Concerning cut-based isolation tau-Ids:

  • WPs for dR=0.5 were trivially added as they are present in MiniAOD,
  • while for dR=0.3 one has to use a set ternary operators to implement logic of cuts. I have tried several possible implementations (one of which is commented out in git repo), but although the implementations technically work they do not return sensible results. It could be that the expressions are too complicated for the string parser.

P.S. I'm away since ~5pm today and will be able provide only reduced fixes to the PR until end of Easter.

@arizzi
Copy link

arizzi commented Apr 4, 2018

are the missing variables in DQM produced in some of the other eras? @gpetruc which era is the dqm tested on ?

@gpetruc-bot
Copy link

gpetruc-bot commented Apr 4, 2018 via email

@mbluj
Copy link
Author

mbluj commented Apr 4, 2018 via email

@arizzi
Copy link

arizzi commented Apr 4, 2018

@mbluj giovanni suggested that you store in the cfi only the 94X latest (i.e. default) config and use the DQM_cff.py to apply era switches that could add back the legacy variables.
Can you try implementing it? (let me know if you can do it today, otherwise I may try to have a look at it at some point)

@mbluj
Copy link
Author

mbluj commented Apr 4, 2018 via email

@mbluj
Copy link
Author

mbluj commented Apr 4, 2018

Era dependence added to NanoDQM_cff for tau plots. However, I have not idea how to test it properly...

@gpetruc-bot
Copy link

@arizzi arizzi merged commit 0733e5e into cms-nanoAOD:master Apr 4, 2018
arizzi pushed a commit that referenced this pull request Apr 4, 2018
* Add payloads for tau-Id MVAIso 2017v2

* Add MVAIso tau-Id 2017v2

* Move definitions of new MVA tau-Ids to a specific cff file

* Use correct eras, add MVAIso2017v1

* Add cut-based isolation WPs

* correct eras, comment out idCombIsodR03 due to incorrect behavior

* further era corrections

* fix bug in eras handling

* add rawIso dR=0.3, remove cut-base WPs and footprintCorr

* remove MVAIso oldDM dR=0.3 and newDM with 2015 training

* remove instead commenting out

* keep all MVAIso 2015 Tau-Ids and remove 2017v1/2 ones for 80X

* Add era modifiers to NanoDQM
arizzi pushed a commit that referenced this pull request Apr 4, 2018
* Add payloads for tau-Id MVAIso 2017v2

* Add MVAIso tau-Id 2017v2

* Move definitions of new MVA tau-Ids to a specific cff file

* Use correct eras, add MVAIso2017v1

* Add cut-based isolation WPs

* correct eras, comment out idCombIsodR03 due to incorrect behavior

* further era corrections

* fix bug in eras handling

* add rawIso dR=0.3, remove cut-base WPs and footprintCorr

* remove MVAIso oldDM dR=0.3 and newDM with 2015 training

* remove instead commenting out

* keep all MVAIso 2015 Tau-Ids and remove 2017v1/2 ones for 80X

* Add era modifiers to NanoDQM
Copy link

@gpetruc-bot gpetruc-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automatic test report for 348530

Code integration

Code checks not run for this PR (no source files modified)

Tests

  • Long test data80X (10000 events): passed, with differences; dqm plots: all, diff
  • Long test data80Xhip (3000 events): passed, with differences; dqm plots: all, diff
  • Long test data94X (10000 events): passed, no significant changes; dqm plots: all, diff
  • Long test data94Xv2 (10000 events): passed, no significant changes; dqm plots: all, diff
  • Long test mc80X (10000 events): passed, with differences; dqm plots: all, diff
  • Long test mc94X (10000 events): passed, no significant changes; dqm plots: all, diff
  • Long test mc94Xv2 (9000 events): passed, with differences; dqm plots: all, diff
  • Test mc_94Xv2: passed
  • Test mc_94X: passed
  • Test mc_80X: passed
  • Test data_94Xv2: passed
  • Test data_94X: passed
  • Test data_80X: passed

Disk size report

Sample kb/event ref kb/event diff
TTbar MC 94Xv1 1.544 1.532 0.012 ( +0.8% )
TTbar MC 94Xv2 1.563 1.549 0.014 ( +0.9% )
TTbar MC 80X 1.569 1.569 -0.000 ( -0.0% )
Data 94Xv1 0.620 0.613 0.008 ( +1.3% )
Data 80X 0.558 0.557 0.001 ( +0.1% )
Data 80X, Mu Run2016E 0.558 0.557 0.001 ( +0.1% )

@arizzi
Copy link

arizzi commented Apr 4, 2018

cherry picked on master-cmsswmaster and masterMergedAndNoData-integration

@arizzi
Copy link

arizzi commented Apr 4, 2018

cms-sw#22840

@mbluj mbluj deleted the CMSSW_9_4_X_MVAIso2017v2ForNano branch September 7, 2018 14:40
@vlimant
Copy link

vlimant commented Jan 26, 2024

@ALL what was the intention is pulling taggers from outside of the GT here ?

@mbluj
Copy link
Author

mbluj commented Jan 26, 2024

Hello,
This was "temporary" solution to enable access to tauID payloads before they were available via GT (used around beginning of Run-2). Then it was used for testing purposes, etc. Anyway, up to my knowledge this was removed from CMSSW already some time ago (about ~2017?), but let me check. Moreover, tauIDs using these payloads are generally not supported for some time already.

@mbluj
Copy link
Author

mbluj commented Jan 26, 2024

After further checks: there is still possibility to run old tauIDs (early Run-2) accessing their payloads w/o GT for certain Run-2 nanoAOD eras. This is steered by the following cff: PhysicsTools/NanoAOD/python/taus_updatedMVAIds_cff.py. The idea was to enable access to consistent set of versions of those IDs independently on version of miniAOD which was used to produce given nanoAOD. It is however not used for long time at least since Run2 legacy production (10_2?). I do not remember why it was kept for newer releases, probably was overlooked. It is not needed to read Run-2 UL samples (10_6) or even to reproduce most recent versions of the mentioned tauID on top of the UL samples as their payloads are accessible via GT.
I think I can prepare quickly a PR removing this stuff from NanoAOD sequences, i.e. remove PhysicsTools/NanoAOD/python/taus_updatedMVAIds_cff.py.

@vlimant
Copy link

vlimant commented Jan 26, 2024

thanks for checking, please prepare a PR indeed.
N.B. we have workflows running from 10.6 MINI, modified with the era run2_nanoAOD_106Xv2 that seem to still pull in the ID from outside the GT ; for these workflows, which we still want to maintain, the corresponding GT will have to be updated accordingly

@mbluj
Copy link
Author

mbluj commented Jan 26, 2024

thanks for checking, please prepare a PR indeed. N.B. we have workflows running from 10.6 MINI, modified with the era run2_nanoAOD_106Xv2 that seem to still pull in the ID from outside the GT ; for these workflows, which we still want to maintain, the corresponding GT will have to be updated accordingly

I agree, it makes sense to maintain run2_nanoAOD_106Xv2 era as it is the only way to produce Nano with UL samples with new releases. But, it is strange to me that old tauID is used by an official workflow with this era. I have to check this in details. Is thre another wf. (in matrix tests) affected by this issue than one reported by Kati?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

7 participants