Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NanoAOD prototype [RFC] #20563

Closed
wants to merge 15 commits into from
Closed

NanoAOD prototype [RFC] #20563

wants to merge 15 commits into from

Conversation

gpetruc
Copy link
Contributor

@gpetruc gpetruc commented Sep 18, 2017

First PR of NanoAOD prototype, mostly to collect comments on the technical implementation of it.

Based on CMSSW_9_4_0_pre1 + #20491, importing all code from https://github.com/gpetruc/NanoAOD except the python post-processing, which we want to put in a standalone package

Data files (e.g. xml trainings of MVAs) are all in one commit, and will be rebased out of the history once we get a new cms-data package for PhysicsTools/NanoAOD

This package define three persistent dataformats (FlatTable, MergeableCounterTable, UniqueString) that are however only needed when NanoAOD is written in full EDM Format.
They could be relocated to a DataFormats/NanoAOD or similar if you think it would be useful. The dataformats are content-agnostic, so we don't expect them to evolve much in the future.

Some other generic classes may be relocated to PhysicsTools/PatAlgos.

Integration with cmsDriver & runTheMatrix is not yet included in this PR, but we probably want to define:

  • one new step for NanoAOD (NANO ?)
  • one new data tier and event content for NanoAOD (NANOAOD, NANOAODSIM?); we may have to define it for both the full EDM and the nano format, which have the same event content but differ in the output module used (PoolOuputModule vs NanoAODOutputModule), advice on how to do this is welcome.
  • one set of matrix workflows testing NanoAOD production out of a bunch of different data & mc files to check the event size, and do simple validation, and possibly two extra workflows for joint RECO+MINIAOD+NANOAOD (T0) & MINIAOD+NANOAOD (MC).

@arizzi @peruzzim @emanueledimarco

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/PR-20563/773

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @gpetruc (Giovanni Petrucciani) for master.

It involves the following packages:

DataFormats/PatCandidates
PhysicsTools/NanoAOD
RecoEgamma/PhotonIdentification

The following packages do not have a category, yet:

PhysicsTools/NanoAOD
Please create a PR for https://github.com/cms-sw/cms-bot/blob/master/categories.py to assign category

@perrotta, @cmsbuild, @monttj, @slava77 can you please review it and eventually sign? Thanks.
@Sam-Harper, @jainshilpi, @rovere, @lgray, @gouskos, @cbernet this is something you requested to watch as well.
@davidlange6, @slava77 you are the release manager for this.

cms-bot commands are listed here

@slava77
Copy link
Contributor

slava77 commented Sep 18, 2017

Data files (e.g. xml trainings of MVAs) are all in one commit, and will be rebased out of the history once we get a new cms-data package for PhysicsTools/NanoAOD

why are they even needed?

  • if it's for reading Nano, then it seems to go against the idea that this is a simple data format without needs for calibration tools
  • if it's for making Nano: shouldn't all the calibrations and training be a part of the miniAOD production (regular EDM framework production).

@gpetruc
Copy link
Contributor Author

gpetruc commented Sep 18, 2017

they're needed for nano-AOD production (in full CMSSW) since some high-level ids or calibrations are derived after miniAOD production (or, they can be reapplied on miniAOD and we don't always re-miniAOD when they change).

@arizzi
Copy link
Contributor

arizzi commented Sep 18, 2017 via email

@davidlange6
Copy link
Contributor

hi all - i started the process of getting the data files into a separate repo - but one comment - perhaps its a good moment to unify the xml dat txt (maybe yaml?) variants of text file endings/formats to store what must be the same sort of information ?

@arizzi
Copy link
Contributor

arizzi commented Sep 19, 2017

@davidlange6 the xmls are TMVA stuff, so not all the same.
for the rest I think different POGs => different format ...

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/PR-20563/843

@cmsbuild
Copy link
Contributor

Pull request #20563 was updated. @perrotta, @cmsbuild, @monttj, @slava77 can you please check and sign again.

//

template <typename T>
class BaseMVAValueMapProducer : public edm::stream::EDProducer<> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to use a GlobalCache here, no? (to reduce total number of weight-file reads in multithreaded mode)
Similar to: https://github.com/cms-sw/cmssw/blob/master/RecoEgamma/EgammaTools/interface/MVAValueMapProducer.h

@slava77
Copy link
Contributor

slava77 commented Sep 21, 2017

Data files (e.g. xml trainings of MVAs) are all in one commit, and will be rebased out of the history once we get a new cms-data package for PhysicsTools/NanoAOD

cms-sw/cmsdist#3431 is merged already (23 hrs ago)
time to rebase?

@arizzi
Copy link
Contributor

arizzi commented Sep 21, 2017 via email

@slava77
Copy link
Contributor

slava77 commented Sep 21, 2017 via email

@arizzi
Copy link
Contributor

arizzi commented Sep 21, 2017 via email

@slava77
Copy link
Contributor

slava77 commented Sep 21, 2017 via email

@gpetruc
Copy link
Contributor Author

gpetruc commented Sep 22, 2017

closing while I do the rebase and cleanup to avoid spamming everyone with each change to the branch. will re-open when done

@gpetruc gpetruc closed this Sep 22, 2017
@gpetruc
Copy link
Contributor Author

gpetruc commented Sep 22, 2017

Relocated dataformats and rebased removing the data files from the history. To run locally in 9_4_0_pre1 now one needs
git clone https://github.com/cms-data/PhysicsTools-NanoAOD.git PhysicsTools/NanoAOD/data
I merged also a few other pending developments in the content, and the integration with cmsDriver (d7dc162)

@slava77
Copy link
Contributor

slava77 commented Sep 22, 2017 via email

@arizzi
Copy link
Contributor

arizzi commented Sep 27, 2017

cloning in the external area doesn't work very nicely with crab...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants