Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First examples and instructions #1

Merged
merged 9 commits into from
Jan 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 117 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,117 @@
# nanoAOD-tools-modules
# Modules for `nanoAOD-tools`

This repo provides some modules that can be used with the
[`nanoAOD-tools`](https://github.com/cms-nanoAOD/nanoAOD-tools)
framework for post-processing CMS's nanoAOD files.

The modules add new branches like lepton scale factors, using the
[`correctionlib`](https://github.com/cms-nanoAOD/correctionlib)
tool, and official
[JSON correction files](https://gitlab.cern.ch/cms-nanoAOD/jsonpog-integration)
provided by the CMS POGs.


## Installation

### Install CMSSW
First, install your favorite CMSSW release.
If you rely on `correctionlib` and `python3`,
it is strongly recommend to use CMSSW 11.3 or newer.
For example,
<table>
<tr>
<td>
CMSSW 11.3.4 (for <a href="http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/">combine v9</a>)
</td>
<td>
CMSSW 12.4.8
</td>
</tr>
<tr>
<td>

```bash
export SCRAM_ARCH=slc7_amd64_gcc900
cmsrel CMSSW_11_3_4
cd CMSSW_11_3_4/src
cmsenv
```
</td>
<td>

```bash
export SCRAM_ARCH=slc7_amd64_gcc10
cmsrel CMSSW_12_4_8
cd CMSSW_12_4_8/src/
cmsenv
```
</td>
</tr>
</table>


### Install `nanoAOD-tools` (for CMSSW 13.2 or older)
Install [`nanoAOD-tools`](https://github.com/cms-nanoAOD/nanoAOD-tools) to process nanoAOD files.
Note that starting from CMSSW 13.3, a basic version of `nanoAOD-tools` is included.
To install the standalone version, please do
```bash
cd $CMSSW_BASE/src/
git clone https://github.com/cms-nanoAOD/nanoAOD-tools.git PhysicsTools/NanoAODTools
```

### Install `NATModules`
Now install this repository as `PhysicsTools/NATModules` and compile (build) everything:
```bash
cd $CMSSW_BASE/src/
cmsenv
git clone [email protected]:cms-cat/nanoAOD-tools-modules.git PhysicsTools/NATModules
scram b
```

### Install `correctionlib` (for CMSSW 11.2 or older)
Starting from CMSSW 11.3,
[`correctionlib`](https://github.com/cms-nanoAOD/correctionlib)
should come pre-installed.
To install yourself for older CMSSW versions,
please see have a look at [documentation](https://cms-nanoaod.github.io/correctionlib/).
Note that `correctionlib` works best for `python3`.

### Install `correctionlib` data
Finally, install the `correctionlib` JSON files provided by CMS
into `PhysicsTools/NATModules/data` from the
[`cms-nanoAOD/jsonpog-integration` repository](https://gitlab.cern.ch/cms-nanoAOD/jsonpog-integration)
on GitLab:
```bash
cd $CMSSW_BASE/src/PhysicsTools/NATModules
git clone ssh://[email protected]:7999/cms-nanoAOD/jsonpog-integration.git data
```
or clone via Kerberos, where `$USER` is your CERN lxplus name:
```bash
kinit [email protected]
git clone https://$USER:@gitlab.cern.ch:8443/cms-nanoAOD/jsonpog-integration.git
```
Alternatively, this repository is regularly synchronized to `/cvmfs/`,
so if your system has access, you can copy the latest version
```bash
cd $CMSSW_BASE/src/PhysicsTools/NATModules
cp -r /cvmfs/cms.cern.ch/rsync/cms-nanoAOD/jsonpog-integration data
```


### Test the installation
Run a module, for example
```bash
cd $CMSSW_BASE/src/PhysicsTools/NATModules
python3 ./test/example_muonSF.py -i root://cms-xrd-global.cern.ch//store/mc/RunIISummer20UL16NanoAODv9/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/NANOAODSIM/20UL16JMENano_106X_mcRun2_asymptotic_v17-v1/2820000/11061525-9BB6-F441-9C12-4489135219B7.root
```


## Usage

To use in your own analysis, you can use the standalone scripts in [`test`](test/) as an example.
If you compiled this package correctly, you can import the modules in
[`PhysicsTools/NATModules/python/modules`](PhysicsTools/NATModules/python/modules)
as
```python
from PhysicsTools.NATModules.modules.muonSF import *
```
80 changes: 80 additions & 0 deletions python/modules/electronSF.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
###
# Compute muon SFs using correctionlib, and store in new branch.
# Load as
# eleSF = electronSF("POG/EGM/2016postVFP_UL/electron.json.gz")
# eleSF.addCorrection("NUM_TrackerMuons_DEN_genTracks", "2016postVFP", "sf")
# eleSF.addCorrection("NUM_MediumID_DEN_TrackerMuons", "2016postVFP", "sfdown", "sfsysdn")
# eleSF.addCorrection("NUM_MediumID_DEN_TrackerMuons", "2016postVFP", "sfup", "sfsysup")
###
from __future__ import print_function
from PhysicsTools.NanoAODTools.postprocessing.framework.eventloop import Module
from correctionlib import CorrectionSet

class ElectronSF(Module):
def __init__(self, json, collection="Electron"):
"""Electron SF correction module.
Parameters:
json: the correction file
collection: name of the collection to be corrected
Use addCorrection() to set which factors should be added.
"""
self.collection = collection
self.names = [ ]
self.scenarios = [ ]
self.wps = [ ]
self.valtypes = [ ]
self.varnames = [ ]
self.evaluators = [ ]
self.evaluator = CorrectionSet.from_file(json)

def addCorrection(self, name, scenario, wp, valtype, varname=None):
"""
Call this method to add a correction factor.
Parameters:
name: name of the corrections, e.g. 'UL-Electron-ID-SF'
scenario: year/scenario, e.g. '2016postVFP'
wp: working point
- string, e.g. 'Loose', 'Medium', 'wp80iso', ...
- function of pT, e.g. lambda pt: 'RecoAbove20' if pt>=20 else 'RecoBelow20'
valtype: type of factor, e.g. 'sf', 'sfup', 'sfdown', ...
varname: branch name suffix (defaults to {wp}_{valtype})
"""
if varname==None: # default suffix
assert isinstance(wp,str), "Please use the varname option to name the branch"
varname = wp+'_'+valtype
self.names.append(name)
self.scenarios.append(scenario)
self.valtypes.append(valtype)
self.wps.append(wp)
self.varnames.append(f"{self.collection}_{varname}") # branch name
self.evaluators.append(self.evaluator[name])

def beginFile(self, inputFile, outputFile, inputTree, wrappedOutputTree):
"""Add branch for every correction to output file."""
self.out = wrappedOutputTree
for varname in set(self.varnames): # avoid duplicates
self.out.branch(varname, 'F', lenVar='nElectron')

def analyze(self, event):
pts = [max(10.001,event.Electron_pt[i]) for i in range(event.nElectron)]
etas = [event.Electron_eta[i] for i in range(event.nElectron)]
for ic in range(len(self.evaluators)):
# We cannot make a single call to evaluate passing eta, pt as arrays
# since POG JSONS are currently provided with flow="error", so we
# have to loop to protect for values out of binning range.
sfs = [1.]*event.nElectron
for iEle in range(event.nElectron):
try:
if isinstance(self.wps[ic],str): # WP is a simple string
wp = self.wps[ic]
else: # assume WP is a function of pT
wp = self.wps[ic](pts[iEle]) # evaluate WP in pT
if wp==None: # evaluate correction only if WP is defined
continue
sfs[iEle] = self.evaluators[ic].evaluate(
self.scenarios[ic], self.valtypes[ic], wp, etas[iEle], pts[iEle])
except:
print(f"ElectronSF.analyze: Exception for {self.scenarios[ic]}, {self.valtypes[ic]}, wp={self.wps[ic]}, eta={etas[iEle]:6.4f}, pt={pts[iEle]:6.4f}")
pass # default sf = 1
self.out.fillBranch(self.varnames[ic], sfs)
return True
68 changes: 68 additions & 0 deletions python/modules/muonSF.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
###
# Compute muon SFs using correctionlib, and store in new branch.
# Load as
# muSF = muonSF("POG/MUO/2016postVFP_UL/muon_Z.json.gz")
# muSF.addCorrection("NUM_TrackerMuons_DEN_genTracks", "2016postVFP_UL", "sf")
# muSF.addCorrection("NUM_MediumID_DEN_TrackerMuons", "2016postVFP_UL", "systdown", "sfsysdn")
# muSF.addCorrection("NUM_MediumID_DEN_TrackerMuons", "2016postVFP_UL", "systup", "sfsysup")
###
from __future__ import print_function
from PhysicsTools.NanoAODTools.postprocessing.framework.eventloop import Module
from correctionlib import CorrectionSet

class MuonSF(Module):
def __init__(self, json, collection="Muon"):
"""Muon SF correction module.
Parameters:
json: the correction file
collection: name of the collection to be corrected
Use addCorrection() to set which factors should be added.
"""
self.collection = collection
self.names = [ ]
self.scenarios = [ ]
self.valtypes = [ ]
self.varnames = [ ]
self.evaluators = [ ]
self.evaluator = CorrectionSet.from_file(json)

def addCorrection(self, name, scenario, valtype, varname=None):
"""
Call this method to add a correction factor.
Parameters:
name: name of the corrections, e.g. 'NUM_TrackerMuons_DEN_genTracks'
scenario: year/scenario, e.g. '2016postVFP_UL'
valtype: type of factor, e.g. 'sf', 'systup', ...
varname: branch name suffix (defaults to valtype)
"""
if varname==None: # default suffix
varname = valtype
self.names.append(name)
self.scenarios.append(scenario)
self.valtypes.append(valtype)
self.varnames.append(f"{self.collection}_{varname}") # branch name
self.evaluators.append(self.evaluator[name])

def beginFile(self, inputFile, outputFile, inputTree, wrappedOutputTree):
"""Add branch for every correction to output file."""
self.out = wrappedOutputTree
for varname in self.varnames:
self.out.branch(varname, 'F', lenVar='nMuon')

def analyze(self, event):
pts = [max(15.001,event.Muon_pt[i]) for i in range(event.nMuon)]
etas = [min(2.39999,abs(event.Muon_eta[i])) for i in range(event.nMuon)]
for ic in range(len(self.evaluators)):
# We cannot make a single call to evaluate passing eta, pt as arrays
# since POG JSONS are currently provided with flow="error", so we
# have to loop to protect for values out of binning range.
sfs = [1.]*event.nMuon
for iMu in range(event.nMuon):
try:
sfs[iMu] = self.evaluators[ic].evaluate(self.scenarios[ic], etas[iMu], pts[iMu], self.valtypes[ic])
except:
print(f"MuonSF.analyze: Exception for {self.scenarios[ic]}, eta={etas[iMu]:6.4f}, pt={pts[iMu]:6.4f}, {self.valtypes[ic]}")
pass # default sf = 1
self.out.fillBranch(self.varnames[ic], sfs)
return True

50 changes: 50 additions & 0 deletions test/example_electronSF.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#!/usr/bin/env python3
###
# This is an example of configuring and calling a generic correctionlib module.
# Notes:
# - For the sake of speed, it is important to apply a preselection whenever possible.
# - Also, it is advisable to insert correction modules like this after modules
# that apply selections, so that corrections are not computed for events that
# are then discarded.
# - Find all available corrections and parameters by running in the command
# correction summary POG/EGM/2016postVFP_UL/electron.json.gz
###
from PhysicsTools.NanoAODTools.postprocessing.framework.postprocessor import PostProcessor
from PhysicsTools.NATModules.modules.electronSF import *

# Set up the muon correction module
eleSF = ElectronSF("data/POG/EGM/2016postVFP_UL/electron.json.gz")
set = 'UL-Electron-ID-SF'
era = '2016postVFP'

# Add electron reconstruction scale factor
reco = lambda pt: 'RecoAbove20' if pt>=20 else 'RecoBelow20'
eleSF.addCorrection(set, era, reco, 'sf', 'Reco_sf')
eleSF.addCorrection(set, era, reco, 'sfdown', 'Reco_sfdn')
eleSF.addCorrection(set, era, reco, 'sfup', 'Reco_sfup')

# Add electron isolation scale factor
eleSF.addCorrection(set, era, 'Medium', 'sf')
eleSF.addCorrection(set, era, 'Medium', 'sfdown', 'sfdn')
eleSF.addCorrection(set, era, 'Medium', 'sfup', 'sfup')

# Add electron identification scale factor
eleSF.addCorrection(set, era, 'Medium', 'sf')
eleSF.addCorrection(set, era, 'Medium', 'sfdown', 'sfdn')
eleSF.addCorrection(set, era, 'Medium', 'sfup', 'sfup')

# Settings for post-processor
from argparse import ArgumentParser
parser = ArgumentParser(description="Process nanoAOD files and add branches",epilog="Good luck!")
parser.add_argument('-i', '--infiles', nargs='+')
parser.add_argument('-m', '--maxevts', type=int, default=10000) # limit number of events (per file) for testing
args = parser.parse_args()
branchsel = None #"keepElectron.txt" # keep only Electron branches for speed
fnames = args.infiles or [
"root://cms-xrd-global.cern.ch//store/mc/RunIISummer20UL16NanoAODv9/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/NANOAODSIM/20UL16JMENano_106X_mcRun2_asymptotic_v17-v1/2820000/11061525-9BB6-F441-9C12-4489135219B7.root"
]

# Process nanoAOD file
p = PostProcessor(".", fnames, "nElectron>=2 && Electron_pt>18", branchsel, [eleSF], maxEntries=args.maxevts, postfix="_ElectronSFs",
outputbranchsel=branchsel, provenance=True, prefetch=True, longTermCache=True)
p.run()
42 changes: 42 additions & 0 deletions test/example_muonSF.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/usr/bin/env python3
###
# This is an example of configuring and calling a generic correctionlib module.
# Notes:
# - For the sake of speed, it is important to apply a preselection whenever possible.
# - Also, it is advisable to insert correction modules like this after modules
# that apply selections, so that corrections are not computed for events that
# are then discarded.
# - Find all available corrections and parameters by running in the command
# correction summary POG/MUO/2016postVFP_UL/muon_Z.json.gz
###
from PhysicsTools.NanoAODTools.postprocessing.framework.postprocessor import PostProcessor
from PhysicsTools.NATModules.modules.muonSF import *

# Set up the muon correction module
muSF = MuonSF("data/POG/MUO/2016postVFP_UL/muon_Z.json.gz")
era = '2016postVFP_UL'

# Add TrackerMuon reconstruction scale factor
muSF.addCorrection('NUM_TrackerMuons_DEN_genTracks', era, 'sf')
# Add Medium ID scale factor
muSF.addCorrection('NUM_MediumID_DEN_TrackerMuons', era, 'sf')
# Add Medium ID scale factor, down variation
muSF.addCorrection('NUM_MediumID_DEN_TrackerMuons', era, 'systdown', 'sfsysdn')
# Add Medium ID scale factor, up variation
muSF.addCorrection('NUM_MediumID_DEN_TrackerMuons', era, 'systup', 'sfsysup')

# Settings for post-processor
from argparse import ArgumentParser
parser = ArgumentParser(description="Process nanoAOD files and add branches",epilog="Good luck!")
parser.add_argument('-i', '--infiles', nargs='+')
parser.add_argument('-m', '--maxevts', type=int, default=10000) # limit number of events (per file) for testing
args = parser.parse_args()
branchsel = None #"keepElectron.txt" # keep only Electron branches for speed
fnames = args.infiles or [
"root://cms-xrd-global.cern.ch//store/mc/RunIISummer20UL16NanoAODv9/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/NANOAODSIM/20UL16JMENano_106X_mcRun2_asymptotic_v17-v1/2820000/11061525-9BB6-F441-9C12-4489135219B7.root"
]

# Process nanoAOD file
p = PostProcessor(".", fnames, "nMuon>=2 && Muon_pt>18", branchsel, [muSF], maxEntries=args.maxevts, postfix="_MuonSFs",
outputbranchsel=branchsel, provenance=True, prefetch=True, longTermCache=True)
p.run()