Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import NanoAOD-tools postoprocessor to CMSSW [13.0.X] #43394

Merged
merged 1 commit into from
Dec 4, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions DataFormats/Math/src/classes.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
#include "DataFormats/Math/interface/Vector.h"
#include "DataFormats/Math/interface/Error.h"
#include "DataFormats/Math/interface/Matrix.h"
#include "DataFormats/Math/interface/libminifloat.h"
#include "DataFormats/Common/interface/Wrapper.h"
#include "DataFormats/Common/interface/RefVector.h"
#include "DataFormats/Common/interface/ValueMap.h"
Expand Down Expand Up @@ -227,4 +228,5 @@ namespace DataFormats_Math {
edm::ValueMap<math::XYZTLorentzVector> vmp4;
edm::Wrapper<edm::ValueMap<math::XYZTLorentzVector> > wvmp4;
};
MiniFloatConverter::ReduceMantissaToNbitsRounding red(12);
} // namespace DataFormats_Math
1 change: 1 addition & 0 deletions DataFormats/Math/src/classes_def.xml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
<class pattern="ROOT::Math::SVector<*>" />
<class pattern="std::vector<ROOT::Math::SMatrix<*>>" />
<class pattern="std::vector<std::pair<ROOT::Math::PositionVector3D<*>,float> >"/>
<class name="MiniFloatConverter::ReduceMantissaToNbitsRounding"/>
</selection>
<exclusion>
<!-- Excluded to avoid duplicate warnings because these dictionaries are defined in ROOT -->
Expand Down
5 changes: 4 additions & 1 deletion PhysicsTools/NanoAOD/scripts/haddnano.py
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/bin/env python
#!/bin/env python3
import ROOT
import numpy
import sys
Expand Down Expand Up @@ -101,5 +101,8 @@ def zeroFill(tree, brName, brObj, allowNonBool=False):
if st.GetString() != obj.GetString():
print("Strings are not matching")
obj.Write()
elif obj.IsA().InheritsFrom(ROOT.THnSparse.Class()):
obj.Merge(inputs)
obj.Write()
else:
print("Cannot handle " + str(obj.IsA().GetName()))
10 changes: 10 additions & 0 deletions PhysicsTools/NanoAODTools/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
__init__.py
*.pyc
.*.swp
.#*
#*#
*~
build
*.d
*.so
*.pcm
123 changes: 123 additions & 0 deletions PhysicsTools/NanoAODTools/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# NanoAODTools
A simple set of python tools to post-process NanoAODs to:
* skim events
* add variables
* produce plots
* perform simple analyses (but beware that performance may be unsatisfactory beacuse of the inherently sequential design model).

It can be used directly from a CMSSW environment, or checked out as a standalone package.

Originally imported to CMSSW from [cms-nanoAOD/nanoAOD-tools](https://github.com/cms-nanoAOD/nanoAOD-tools) (post-processor functionality only).

## Usage in CMSSW
No specific setup is needed.

It is recommended to add external modules (eg correction modules) in separate packages.

## Standalone usage (without CMSSW): checkout instructions

You need to setup python 3 and a recent ROOT version first.

wget https://raw.githubusercontent.com/cms-sw/cmssw/master/PhysicsTools/NanoAODTools/standalone/checkoutStandalone.sh
bash checkoutStandalone.sh -d MyProject
cd MyProject
source PhysicsTools/NanoAODTools/standalone/env_standalone.sh

Repeat only the last command at the beginning of every session.

It is recommended to add analysis code, correction modules etc. in a separate package and repository rather than in a CMSSW fork.

Please note that some limitations apply:
* Bindings to C++ code are up to the user.
* Adding packed variables is not supported, as this requires binding to the corresponding code.
Using from a CMSSW environment is thus recommended.

## General instructions to run the post-processing step
The post-processor can be run in two different ways:
* In a normal python script.
* From the command line, using the script under `scripts/nano_postproc.py` (more details [below](#command-line-invocation)).

## How to write and run modules

It is possible to define modules that will be run on each entry passing the event selection, and can be used to calculate new variables that will be included in the output tree (both in friend and full mode) or to apply event filter decisions.

A first, very simple example is available in `test/exampleAnalysis.py`. It can be executed directly, and implements a module to fill a plot.

An example of an module coded to be imported in scripts or called with the command-line interface is available in `python/postprocessing/examples/exampleModule.py`. This module adds one new variable, which can be stored in skimmed NanoAOD and also used in the subsequent Modules in the same job. The example `test/example_postproc.py` shows how to import and use it in a script while skimming events.

Let us now examine the structure of a module class.
* All modules must inherit from `PhysicsTools.NanoAODTools.postprocessing.framework.eventloop.Module`.
* the `__init__` constructor function should be used to set the module options.
* the `beginFile` function should create the branches that you want to add to the output file, calling the `branch(branchname, typecode, lenVar)` method of `wrappedOutputTree`. `typecode` should be the ROOT TBranch type ("F" for float, "I" for int etc.). `lenVar` should be the name of the variable holding the length of array branches (for instance, `branch("Electron_myNewVar","F","nElectron")`). If the `lenVar` branch does not exist already - it can happen if you create a new collection - it will be automatically created.
* the `analyze` function is called on each event. It should return `True` if the event is to be retained, `False` if it should be dropped.

The event interface, defined in `PhysicsTools.NanoAODTools.postprocessing.framework.datamodule`, allows to dynamically construct views of objects organized in collections, based on the branch names, for instance:

electrons = Collection(event, "Electron")
if len(electrons)>1: print electrons[0].someVar+electrons[1].someVar
electrons_highpt = filter(lambda x: x.pt>50, electrons)

and this will access the elements of the `Electron_someVar`, `Electron_pt` branch arrays. Event variables can be accessed simply by `event.someVar`, for instance `event.rho`.

The output branches should be filled calling the `fillBranch(branchname, value)` method of `wrappedOutputTree`. `value` should be the desired value for single-value branches, an iterable with the correct length for array branches. It is not necessary to fill the `lenVar` branch explicitly, as this is done automatically using the length of the passed iterable.



### Command-line invocation
The basic syntax of the command line invocation is the following:

nano_postproc.py /path/to/output_directory /path/to/input_tree.root

(in standalone mode, should be invoked as `./scripts/nano_postproc.py`).

Here is a summary of its features:
* the `-s`,`--postfix` option is used to specify the suffix that will be appended to the input file name to obtain the output file name. It defaults to *_Friend* in friend mode, *_Skim* in full mode.
* the `-c`,`--cut` option is used to pass a string expression (using the same syntax as in TTree::Draw) that will be used to select events. It cannot be used in friend mode.
* the `-J`,`--json` option is used to pass the name of a JSON file that will be used to select events. It cannot be used in friend mode.
* if run with the `--full` option (default), the output will be a full nanoAOD file. If run with the `--friend` option, instead, the output will be a friend tree that can be attached to the input tree. In the latter case, it is not possible to apply any kind of event selection, as the number of entries in the parent and friend tree must be the same.
* the `-b`,`--branch-selection` option is used to pass the name of a file containing directives to keep or drop branches from the output tree. The file should contain one directive among `keep`/`drop` (wildcards allowed as in TTree::SetBranchStatus) or `keepmatch`/`dropmatch` (python regexp matching the branch name) per line. More details are provided in the section [Keep/drop branches](#keepdrop-branches) below.
* `--bi` and `--bo` allows to specify the keep/drop file separately for input and output trees.
* the `--justcount` option will cause the script to printout the number of selected events, without actually writing the output file.

Please run with `--help` for a complete list of options.

Let's take the already mentioned [exampleModule.py](python/postprocessing/examples/exampleModule.py). It contains a simple constructor:
```
exampleModuleConstr = lambda : exampleProducer(jetSelection= lambda j : j.pt > 30)
```
whih can be imported using the following syntax:

```
nano_postproc.py outDir /eos/cms/store/user/andrey/f.root -I PhysicsTools.NanoAODTools.postprocessing.examples.exampleModule exampleModuleConstr
```

### Keep/drop branches
See the effect of keep/drop instructions by creating a `keep_and_drop.txt` file:

```
drop *
keep Muon*
keep Electron*
keep Jet*
```
and specify it with thne --bi option:
```
python scripts/nano_postproc.py outDir /eos/cms/store/user/andrey/f.root -I PhysicsTools.NanoAODTools.postprocessing.examples.exampleModule exampleModuleConstr -s _exaModu_keepdrop --bi keep_and_drop_input.txt
```
comparing to the previous command (without `--bi`), the output branch created by _exampleModuleConstr_ is the same, but with --bi all other branche are dropped when creating output tree. It also runs faster.
Option --bo can be used to further fliter output branches.

The keep and drop directive also accept python lists __when called from a python script__, e.g:
```
outputbranchsel=["drop *", "keep EventMass"]
```

### Calling C++ helpers
Now, let's have a look at another example, `python/postprocessing/examples/mhtjuProducerCpp.py` ([link](python/postprocessing/examples/mhtjuProducerCpp.py)). Similarly, it should be imported using the following syntax:

```
nano_postproc.py outDir /eos/cms/store/user/andrey/f.root -I PhysicsTools.NanoAODTools.postprocessing.examples.mhtjuProducerCpp mhtju
```
This module has the same structure of its producer as `exampleProducer`, but in addition it utilizes a C++ code to calculate the mht variable, `test/examples/mhtjuProducerCppWorker.cc`. This code is loaded in the `__init__` method of the producer.


17 changes: 17 additions & 0 deletions PhysicsTools/NanoAODTools/crab/PSet.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# this fake PSET is needed for local test and for crab to figure the output
# filename you do not need to edit it unless you want to do a local test using
# a different input file than the one marked below
import FWCore.ParameterSet.Config as cms
process = cms.Process('NANO')
process.source = cms.Source(
"PoolSource",
fileNames=cms.untracked.vstring(),
# lumisToProcess=cms.untracked.VLuminosityBlockRange("254231:1-254231:24")
)
process.source.fileNames = [
'../../NanoAOD/test/lzma.root' # you can change only this line
]
process.maxEvents = cms.untracked.PSet(input=cms.untracked.int32(10))
process.output = cms.OutputModule("PoolOutputModule",
fileName=cms.untracked.string('tree.root'))
process.out = cms.EndPath(process.output)
33 changes: 33 additions & 0 deletions PhysicsTools/NanoAODTools/crab/crab_cfg.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
from WMCore.Configuration import Configuration
from CRABClient.UserUtilities import config, getUsernameFromSiteDB

config = Configuration()

config.section_("General")
config.General.requestName = 'NanoPost1'
config.General.transferLogs = True
config.section_("JobType")
config.JobType.pluginName = 'Analysis'
config.JobType.psetName = 'PSet.py'
config.JobType.scriptExe = 'crab_script.sh'
config.JobType.inputFiles = ['crab_script.py']
config.JobType.sendPythonFolder = True
config.section_("Data")
config.Data.inputDataset = '/DYJetsToLL_1J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIIFall17NanoAOD-PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/NANOAODSIM'
#config.Data.inputDBS = 'phys03'
config.Data.inputDBS = 'global'
config.Data.splitting = 'FileBased'
#config.Data.splitting = 'EventAwareLumiBased'
config.Data.unitsPerJob = 2
config.Data.totalUnits = 10

config.Data.outLFNDirBase = '/store/user/%s/NanoPost' % (
getUsernameFromSiteDB())
config.Data.publication = False
config.Data.outputDatasetTag = 'NanoTestPost'
config.section_("Site")
config.Site.storageSite = "T2_DE_DESY"

#config.Site.storageSite = "T2_CH_CERN"
# config.section_("User")
#config.User.voGroup = 'dcms'
18 changes: 18 additions & 0 deletions PhysicsTools/NanoAODTools/crab/crab_script.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/usr/bin/env python3
import os
from PhysicsTools.NanoAODTools.postprocessing.framework.postprocessor import *

# this takes care of converting the input files from CRAB
from PhysicsTools.NanoAODTools.postprocessing.utils.crabhelper import inputFiles, runsAndLumis

from PhysicsTools.NanoAODTools.postprocessing.examples.exampleModule import *
p = PostProcessor(".",
inputFiles(),
"Jet_pt>200",
modules=[exampleModuleConstr()],
provenance=True,
fwkJobReport=True,
jsonInput=runsAndLumis())
p.run()

print("DONE")
27 changes: 27 additions & 0 deletions PhysicsTools/NanoAODTools/crab/crab_script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#this is not meant to be run locally
#
echo Check if TTY
if [ "`tty`" != "not a tty" ]; then
echo "YOU SHOULD NOT RUN THIS IN INTERACTIVE, IT DELETES YOUR LOCAL FILES"
else

echo "ENV..................................."
env
echo "VOMS"
voms-proxy-info -all
echo "CMSSW BASE, python path, pwd"
echo $CMSSW_BASE
echo $PYTHON_PATH
echo $PWD
rm -rf $CMSSW_BASE/lib/
rm -rf $CMSSW_BASE/src/
rm -rf $CMSSW_BASE/module/
rm -rf $CMSSW_BASE/python/
mv lib $CMSSW_BASE/lib
mv src $CMSSW_BASE/src
mv module $CMSSW_BASE/module
mv python $CMSSW_BASE/python

echo Found Proxy in: $X509_USER_PROXY
python crab_script.py $1
fi
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# This is an example of a NanoAODTools Module to add one variable to nanoAODs.
# Note that:
# -the new variable will be available for use in the subsequent modules
# -it is possible to update the value for existing variables
#
# Example of using from command line:
# nano_postproc.py outDir /eos/cms/store/user/andrey/f.root -I PhysicsTools.NanoAODTools.postprocessing.examples.exampleModule exampleModuleConstr
#
# Example of running in a python script: see test/example_postproc.py
#

from PhysicsTools.NanoAODTools.postprocessing.framework.datamodel import Collection
from PhysicsTools.NanoAODTools.postprocessing.framework.eventloop import Module
import ROOT
ROOT.PyConfig.IgnoreCommandLineOptions = True


class exampleProducer(Module):
def __init__(self, jetSelection):
self.jetSel = jetSelection
pass

def beginJob(self):
pass

def endJob(self):
pass

def beginFile(self, inputFile, outputFile, inputTree, wrappedOutputTree):
self.out = wrappedOutputTree
self.out.branch("EventMass", "F")

def endFile(self, inputFile, outputFile, inputTree, wrappedOutputTree):
pass

def analyze(self, event):
"""process event, return True (go to next module) or False (fail, go to next event)"""
electrons = Collection(event, "Electron")
muons = Collection(event, "Muon")
jets = Collection(event, "Jet")
eventSum = ROOT.TLorentzVector()
for lep in muons:
eventSum += lep.p4()
for lep in electrons:
eventSum += lep.p4()
for j in filter(self.jetSel, jets):
eventSum += j.p4()
self.out.fillBranch("EventMass", eventSum.M())
return True


# define modules using the syntax 'name = lambda : constructor' to avoid having them loaded when not needed

exampleModuleConstr = lambda: exampleProducer(jetSelection=lambda j: j.pt > 30)
Loading