Skip to content
Thomas Peiffer edited this page May 4, 2016 · 14 revisions

Ntuple Format

The current ROOT ntuple format in the master branch is the same as for UHHAnalysis. However, the transition to 13TeV data analysis offers the opportunity to make some cleanups while adding new variables. The dataformat is developed in a separate branch next-ntuple-format to allow testing while still keeping the current dataformat for ongoing analysis which can use the master branch.

NOTE: The next-ntuple-format branch should be used together with CMSSW74X which includes improvements in the candidate-based b-tagging, which is used by the ntuplewriter CMSSW python configuration.

Currently included changes are:

  • Introduce a class Tags which allows to store arbitrary (int-indexed) float data to allow easier evolution of the dataformat in the future by adding new integer keys (as new enum values).
  • Add the Tags mechanism to the classes Jet, TopJet, Electron, Muon, Tau to add more b-tagging / top-tagging / lepton variables.
  • For electrons, use the Tags mechanism to store the currently recommended PHYS14 electron ids as computed by the CMSSW module.
  • For Muon and Tau, use a uint64_t to store single boolean data as bits. For Taus, use this bits to store the new recommended id variables. For Muons, remove some data only needed for the ids (which are very stable now); instead, store the ids bits.
  • For Jets, remove the GenJet Particle pointer, which was also stored in the file, although it shouldn't be.
  • For TopJet change subjets to have type vector<Jet> instead of vector<Particle> for saving subjet information (flavor, JEC, area, etc.) consistently; this also allows reducing the amount of code duplication in SubjetCorrector.
  • For TopJet, remove detailed btagging variables
  • remove PFParticle class, as these are never stored in the UHH2 framework

While changing the dataformat, also changes to NtupleWriter have beeen made:

  • calculate substructure information (nsubjettiness, qjets mass volatility) for TopJets in CMSSW instead of using own interface to fastjet.

Further proposals

Currently not included are these changes:

  • Consider saving isRealData as metadata, as these change rarely between events

Caveats and limitations

Not everything is available in the next-ntuple-format; in particular:

  • subjet b-tagging is not working for puppi jets (it does work for chs subjets, and for puppi fat-jets). From the ntuplewriter, the configuration is there, but the explicit jet-track association in CMSSW does not work (all puppi particles have bestTrack()==0, which breaks the CandIPProducer). To be fixed in CMSSW-80X.
Clone this wiki locally