cmsrel CMSSW_10_2_5
cd CMSSW_10_2_5/src
git cms-addpkg PhysicsTools/KinFitter CommonTools/Utils CondFormats/JetMETObjects CondFormats/Serialization FWCore/MessageLogger FWCore/Utilities JetMETCorrections/Modules
scram b -j 8
git clone
# from bbbbAnalysis/
source scripts/ # only needed once for every new shell
make exe -j # compiles and makes everything under test/ executable
fill_histograms.exe config/<config_name>.cfg
Use the plotter/
script. Styles (line colors, etc.) for the processes are defined in plotter/styles/
Inside the script, the subset of processes to run on is defined through bkgToPlot
and sigToPlot
Several cmd line options available to configure the plot, it's practical to make a script that produces all the plots.
After we produced the bbbb_ntuple skims of data and simulation, the next step is to use machine learning techniques to train a discriminator (e.g. BDT or DNN) or create a data-driven estimation of the HH backgrounds. These techniques are developed and executed inside mlskim directory. The two main python scripts inside the mlskim folder are and
We have three groups of data and MC samples for Run 2 data analysis (2016, 2017 & 2018). Therefore, after producing the ntuples for each year, it is convenient to place them under the same directory in eos. For instance, we put our 2016 bbbb_ntuples as "New2016" in /eos/uscms/store/user/guerrero/bbbb_ntuples/FullNtuples/ directory, and the same for the other years. Moreover, for convenience, we merge (hadd!) the bbbbntuple files associated to each MC process or data in a single file. This is done by running the script in the inputskims (Note that the script should be adapted to eos location, username, etc):
cd mlskim/inputskims
source {eosname} ### In our case eosname is FullNtuples
The code is able to process the inputskims files in data and MC samples using panda dataframes. It can take only the branches that we are interested in for the developments. These branches of interest are specified as 'variables' in the config files outputskim_201*.cfg in the folder config, and the inputskims of interest are specified as 'samples'. One can also create new branches if needed by editing the script. To process the inputskims to outputskims one simply run the script:
The code creates a data-driven background model taking as input the control region information in 3-btag and 4 -btag data. This method is based on the BDT-reweighter method ( and uses the python package provided by the developers called hep_ml (
The regions used for the training of the model are defined in modules/ The parameters of the BDT-reweighter are included in the config files. The script creates four weights (where Ana=AnalysisRegion and Val=ValidationRegion): Weight_AnaGGF, Weight_AnaVBF, Weight_ValGGF and Weight_ValVBF. These weights are stored as branches in the output file (SKIM_MODEL_BKG.root). To run the background modeling:
Log on a CentOS 7 machine (lxplus, cmslpc-sl7) and install combine following the instructions here
After compiling CMSSW, do git clone
(NB: no need to compile the code with scram and make exe since it will only run combine).
Scripts for running combine are under limits