Skip to content

Installing, Compiling, Ntuples (RunII 2016,17, 18 datasets in CMSSW_10_2_X v1)

Robin edited this page May 25, 2020 · 23 revisions

Table of contents

  1. Installation
  2. Compiling
  3. Handling different datasets
  4. Jet Collections
  5. GenJet Collections

Installation

Installing requires installing FastJet, SFrame, and UHH2. This is now done via one script. It is assumed that a CMS environment is available on the machine.

❗ This branch requires a SL6 machine, and not an EL7 machine. e.g. naf-cms11.desy.de or lxplus6.cern.ch.

Before you begin: it is a good idea to create an empty directory, and in there run the following commands. This ensures it will not clash with any existing installations of FastJet, SFrame, & UHH2.

Also please don't skip installing FastJet & SFrame - fresh copies are needed, otherwise you will get issues linking & compiling things.

Download the installation script from GitHub and execute it:

wget https://raw.githubusercontent.com/UHH2/UHH2/RunII_102X_v1/scripts/install.sh
source install.sh

For csh users: use the install.csh script instead

For zsh users: should work OK so long as you source install.sh

If this exceeds your quota, do

export CMSSW_GIT_REFERENCE=<DIRECTORY_WITH_ENOUGH_SPACE>/cmssw.git

and try again.

Alternatively, execute all the steps given in the install.sh script one after the other. Sometimes, the compilation with cmsRun fails. In this case, start a new installation in a clean shell.

Compiling

Immediately after running the install script, assuming you are now in CMSSW_*/src/UHH2/:

cmsenv
cd ../../../SFrame
source setup.sh
make -j4
cd ../CMSSW_*/src/UHH2
make -j9

NB: each time you log back in, you need to run both cmsenv and source setup.sh.

Compilation info for SFrame & UHH2

Before going on, it is important to realize that the UHH2 code has -- in general -- to be compiled twice: once for CMSSW and once for SFrame execution. This is because both packages have different dependencies and naming conventions; for example, the SFrame binaries will be placed in $SFRAME_LIB_DIR, while the CMSSW binaries are in $CMSSW_BASE/lib/$SCRAM_ARCH.

Note that SFrame can only be compiled after you "activated" your CMSSW release with cmsenv - this is to get the correct ROOT, etc.

Usually, all you need to do is go to the UHH2 directory and type make. This will build both the code for SFrame and for CMSSW. To only build the code for SFrame use, execute

make sframe

and to only build code for CMSSW, execute

make scram

This will actually run scram b in the whole CMSSW installation and thus also compile other CMSSW packages (if this is not what you want only run make sframe and run scram b manually).

2.a. Notes on SFrame compilation

For the SFrame compilation, as default, only the directories from the UHH2/UHH2 repository are compiled (see UHH2/Makefile for details). To enable compilation of additional analysis directories, create UHH2/Makefile.local with the contents such as

dirs += MyAnalysis1 MyAnalysis2

This will trigger the build also in the directories named MyAnalysis1 and MyAnalysis2. (The reason for using Makefile.local is to avoid getting in each other's way: everyone can have their own Makefile.local, which is ignored by git).

For cleaning up, use make clean. Cleaning up manually can be done by removing all files in the obj subdirectory (which is the location used by all auto-generated files) and the SFrame libraries, i.e. $SFRAME_DIR/lib/libSUHH2*.

2.b. Notes on CMSSW compilation

You can also compile the CMSSW part by executing

scram b

in the CMSSW directory yourself instead of using make scram (the main purpose of the latter command is to prevent accidentally forgetting to build the CMSSW part by making the scram the default).

For cleaning up the CMSSW build, run

scram b clean

(as is usual for CMSSW); note that make clean only cleans up the SFrame compilation, not the CMSSW one.

Handling different datasets

This release is unique, in that it can handle multiple years' datasets. We now have unique ntuplewriter_<type>_<year>.py config files for each year, with <type> = data or mc, which one can use directly. In this way, you can use the correct file in e.g. CRAB script, without needing to switch flags.

In order to facilitate this, we now use a generic function, generate_process(...) in core/python/ntuple_generator.py. The only mandatory argument is a year argument.

The possible years are:

  • 2016v2: for 2016 MiniAODv2 / 03Feb2017 data
  • 2016v3: for 2016 MiniAODv3 / rereco data
  • 2017v1: for 2017 Prompt data & RunIIFall17MiniAOD MC
  • 2017v2: for 2017 ReReco Data "31Mar18" & RunIIFall17MiniAODv2 MC
  • 2018: for 2018 data & Autumn18 MC

You can then simply run each script: cmsRun ntuplewriter_<type>_<year>.py

Bonus

There are now some commandline arguments! See cmsRun ntuplewriter_<type>_<year>.py help for all of them.

e.g.:

cmsRun ntuplewriter_xxx_yyy.py maxEvents=100 outputFile=testNtuple.root wantSummary=1

Jet Collections

The jet collections are designed to be consistent across all years/datasets. A brief explainer of what is in each jet collection:

Name jetsAk4CHS jetsAk4Puppi jetsAk8CHS jetsAk8Puppi jetsAk8CHSSubstructure_SoftDropCHS jetsAk8PuppiSubstructure_SoftDropPuppi
Class type Jet Jet Jet Jet TopJet TopJet
Clustering algorithm Anti-kT Anti-kT Anti-kT Anti-kT Anti-kT Anti-kT
Cone size 0.4 0.4 0.8 0.8 0.8 0.8
Pileup subtraction CHS PUPPI CHS PUPPI CHS PUPPI
Has groomed subjets? No No No No Yes (SoftDrop) Yes (SoftDrop)
Substructure/interesting variables None PUPPI multiplicities, DeepFlavour DeepFlavour PUPPI multiplicities, DeepFlavour DeepFlavour, DeepBoostedJetTags (i.e. DeepJet), Nsubjettiness (tau_1,2,3,4, groomed & ungroomed), Energy correlation functions (N=2,3 * beta=1,2, groomed only) PUPPI multiplicities, DeepFlavour, DeepBoostedJetTags (i.e. DeepJet), Nsubjettiness (tau_1,2,3,4, groomed & ungroomed), Energy correlation functions (N=2,3 * beta=1,2, groomed only)
Other notes Is just slimmedJets from MiniAOD slimmedJetsPuppi from MiniAOD + extras Reclustered with low pT threshold (10 GeV), especially for JERC studies Reclustered with low pT threshold (10 GeV), especially for JERC studies Reclustered AK8 CHS jets with groomed subjets, higher pT threshold (150 GeV). Main jet kinematics are ungroomed. Designed for boosted/high pT jet studies. Reclustered AK8 PUPPI jets with groomed subjets, higher pT threshold (150 GeV). Main jet kinematics are ungroomed. Designed for boosted/high pT jet studies.

NB DeepCSV, combinedSecondaryVertices, and combinedSecondaryVerticesMVA are only valid on CHS jets - BTV POG don't support PUPPI jets (as of last edit). For all jets we take those values from MiniAOD - these may or may not be sensible or valid. For 2016v2 datasets, we recalculate those values ourselves, since they were not in MiniAOD originally.

GenJet Collections

The following genjet collections are available across all years. Note that all collections are composed of final-state genparticles excluding neutrinos.

Name slimmedGenJets slimmedGenJetsAK8 genjetsAk8Substructure genjetsAk8SubstructureSoftDrop
Class type GenJet GenJet GenTopJet GenTopJet
Clustering algorithm anti-kT anti-kT anti-kT anti-kT
Cone size 0.4 0.8 0.8 0.8
pT cut pT > 8 GeV pT > 150 GeV. For lower pT studies, 3 jets are kept (with minimal information) from 30-100 GeV. pT > 150 GeV pT > 150 GeV
Has groomed subjets? No No No Yes
Substructure/interesting variables N/A N/A Ungroomed Njettiness (tau_1,2,3,4) Groomed Njettiness (tau_1,2,3,4), ECFs (N=2,3 * beta=1,2)
Other notes Kinematics are ungroomed Kinematics are groomed. genparticles_indices are the constituents of the groomed fatjet (= sum over subjet constituents)
Clone this wiki locally