MttTools how to

#Presentation

These tools correspond to the Mtt analysis itself. Here, we select the events of interest ; we search for the best fit for the PDF of the signal ; we fit the background distribution. This is done for the nominal datasets and the datasets used for the systematics.

#Step1: PreSkim

Presentation

MttTools> cd PreSkim

At this step, we do a preselection on the events, that is:

we only keep events that passed our selections in Extractor (isSel == 1);
we impose a cut on the pt lepton: pt(lepton) > ptLeptonCut (ptLeptonCut = 26 GeV if the lepton is a muon and ptLeptonCut = 30 GeV if the lepton is an electron) ;
The TOP reference selection excludes electron with SuperCluster eta between 1.4442 and 1.5660. The TopTrigger efficiency does the same thing, but using electron eta instead of SuperCluster eta. That's why, in the step, we redo a cut on electron eta (we require fabs(etaLepton) < 1.442 && fabs(etaLepton) >= 1.5660) ;
we ask for at least one iteration for the MVA or Chi2 algorithm (used to find the best combination of the jets to reconstruct mtt).

The script containing this preselection and all the histogramms at generator level is preSkim.cpp. It is the script you have to edit if you want to change the preselection or add a generator-level histogram. The python files correspond to the configuration files which have to be tuned to run on your own datasets.

How to use it

The script preSkim.cpp takes as inputs the list of the Extractuples (.list file) and returns a .root output file. You have to put the list of the Extractuples you want to run on in the configuration files: skimMC.py for MC and skimData.py for data.

You can make a symbolic link of the Extractuples lists you have generated after the Extractor step (see the instructions here). If we consider that the the folder containing the lists of your Extractuples, as given in the example, is /<path>/MyListsFolder1, you can create a symbolic link just by typing these few commands:

PreSkim> mkdir lists
PreSkim> cd lists
lists> ln -s /<path>/MyListsFolder1/*.list .
lists> cd ../

Now, edit the skimMC.py and skimData.py scripts and complete them with the correct informations as explained below.

For data

The script is built like that:

files = [
    ["mySkimOutputFile1.root", "lists/myDataset1.list", "type"],
    ["mySkimOutputFile2.root", "lists/myDataset2.list", "type"]
]

where

myOutputFile1.root is the name of your output rootfile, called a skim. It will be stored in the skims// directory (the script generates automatically the directories "" and "" where aDate is the date when you create your directory and aCategory can be data, semimu or semie) ;
lists/myDataset1.list the lists of the Extractuples you have just linked ;
"type" is an option to specify if your dataset is a semi-muonic or a semi-electronic channel ;

Just adapt the script to run on your own datasets.

For MC

The script is built like that:

files = [
    ["mySkimOutputFile1.root", "lists/myDataset1_%s.list"],
    ["mySkimOutputFile2.root", "lists/myDataset2_%s.list"]
]

where

myOutputFile1.root is the name of your output rootfile, called a skim. It will be stored in the skims// directory (the script generates automatically the directories "" and "" where aDate is the date when you create your directory and aCategory can be data, semimu or semie) ;
lists/myDataset1.list the lists of the Extractuples you have just linked. Note that here, you don't have to specify the type (semimu or semie): %s stands for it (don't forget to remove "semie" or "semimu" from your lists name and replace it by "%s") and the script replaces it by corresponding types.

Just adapt the script to run on your own datasets.

Run the script

Now you are ready to run the script. Type the following commands:

PreSkim> source ../setup_lyoserv_env.sh
PreSkim> grid-proxy-init
PreSkim> make
PreSkim> ./skimMC.py
PreSkim> ./skimData.py

Step 2: Extractor2Histo

Presentation

PreSkim> cd ../Extractor2Histo

The aim of this step is to convert the Extractuples into histograms. At this point, you also do the full selection (implementation of the different cut variable, choice of MVA or Chi2 method...).

The script that fills this role is Extractor2Histo.cpp. You can feed the script directly with the Extractuples lists or with the skims you created in step 1. The python files correspond to the configuration files which have to be tuned to run on your own datasets.

How to use it

Usually, you feed the script with the skims created at the step before. So, the first thing you have to do is to build a symbolic link pointing on your step 2 skims in the corresponding folder, as you have already done before in step 1.

Extractor2Histo> mkdir skims
Extractor2Histo> cd skims
skims> mkdir data
skims> mkdir semie
skims> mkdir semimu
skims> cd data
data> ln -s ../../../PreSkim/skims/<aDate>/data/* .
data> cd ../semie
semie> ln -s ../../../PreSkim/skims/<aDate>/semie/* .
semie> cd ../semimu
semimu> ln -s ../../../PreSkim/skims/<aDate>/semimu/* .
semimu> cd ../../

Now edit the python configuration files to adapt the script to your own datasets.

For data

With extractDataFromSkim.py:

The script extractDataFromSkim.py is built like that:

files = [
    ["myHistoOutputFile1.root", "skims/data/mySkimOutputFile1.root", "type"],
    ["myHistoOutputFile2.root", "skims/data/mySkimOutputFile2.root", "type"]
]

where

myHistoOutputFile1.root is the name of your output rootfile. It will be stored in the plots//1-btag/data/ and plots//2-btag/data/ directories, where 1-btag and 2-btag corresponds to your selection "requiring exactly one b-tagged jet" or "requiring at least 2 b-tagged jets". The script generates automatically the directories /, and aCategory stands for semimu or semie ;
skims/data/mySkimOutputFile1.root the skims created in step 1 you have just linked ;
"type" is an option to specify if your dataset is a semi-muonic or a semi-electronic channel ;

For MC

With extractMCFromSkim.py:

The script extractMCFromSkim.py is built like that:

files = [
    ["myHistoOutputFile1.root", "skims/%s/mySkimOutputFile1.root"],
    ["myHistoOutputFile2.root", "skims/%s/mySkimOutputFile2.root"]
]

where

myHistoOutputFile1.root is the name of your output rootfile. It will be stored in the plots//1-btag// and plots//2-btag// directories, where 1-btag and 2-btag corresponds to your selection "requiring exactly one b-tagged jet" or "requiring at least 2 b-tagged jets". The script generates automatically the directories / and aCategory stands for semimu or semie ;
skims/%s/mySkimOutputFile1.root the skims created in step 1 you have just linked. Note that here, you don't have to specify the type (semimu or semie): %s stands for it and the script replaces it by corresponding types automatically.

Run the script

Now you are ready to run the script. Type the following commands:

Extractor2Histo> source ../setup_lyoserv_env.sh
Extractor2Histo> make
Extractor2Histo> ./extractDataFromSkim.py
Extractor2Histo> ./extractMCFromSkim.py

Step 3: PlotIt

Presentation

Extractor2Histo> cd ../

At this step, you make plots of the histograms created in step 2. Note that plotIt is a standalone sub-repository that can works independently of the Mtt tools. plotIt_mtt is a repository using plotIt adapted to Mtt tools.

How to use it

To be able to use plotIt_mtt, you first have to compile plotIt repository.

MttTools> cd plotIt
plotIt> source setup_lyoserv_env.sh
plotIt> cd external
external> ./build-external.sh
external> cd ../
plotIt> make
plotIt> cd ../

The script used in plotIt_mtt runs on the root files generated in step 2 thanks to Extracted2Histo, so you need to create symbolic links to these files. To do this, follow the instructions below:

MttTools> cd Extractor2Histo/plots
plots> ln -s <aDate> Latest
plots> cd ../../plotIt_mtt
plotIt_mtt> mkdir inputs
plotIt_mtt> cd inputs
inputs> ln -s ../../Extractor2Histo/plots/Latest .
inputs> cd ../

The .yml files are configurations files for plotIt. They are all built following 3 blocs:

configuration: Specify here the size of your plots, the title, the total luminosity (if it is always the same in your different datasets) and its error etc. ...
files: Precise here the name of the root files you want to run on. The name is given from the path specify in "root" in the configuration bloc. The files can be of 3 types that you have to specify: mc, signal or data. If the file is mc type, you have to precize the cross-section and the number of generated events in order to normalize the MC plots to the data ones.
plots: Specify here the histograms plots you want to draw.

The command to run the script is:

plotIt_mtt> ../plotIt/plotIt -o myOutputPlotsFolder myConfig.yml

where:

myOutputPlotsFolder is the name of the folder where you want to store your plots ;
myConfig.yml is your configuration file.

You have to run plotIt for "requiring exactly 1 b-tagged jet" and for "requiring at least 2 b-tagged" selections, and for each of those, for both semi-electronic and semi-muonic channels. Actually, the plotAll.sh script does it automatically for you. It saves the plots in the plots/<aDate>, classified into 1-btag or 2-btag, and then semie or semimu (the directories are created automatically by the script).

Finally, type the following command to run the plotAll.sh script:

plotIt_mtt> ./plotAll.sh

Step 4: Extractor2Dataset

Presentation

plotIt_mtt> cd ../Extractor2Dataset

The aim of this step is to convert the Extractuples into datasets. At the end, you generate different datasets for data, MC (background) and signal.

The script that fills this role is Extractor2Dataset.cpp. You can feed the script directly with the Extractuples lists or with the skims you created in step 1. The python files correspond to the configuration files which have to be tuned to run on your own datasets.

How to use it

Follow the instructions given in step 2 for Extractor2Histo.

Step 5: FritSignal

Presentation

Extractor2Dataset> cd ../Fit

The aim of this step is to fit the signal and extract the corresponding Probability Density Function (PDF) from the datasets generated in the previous step.

The script that fills this role is fritSignal.cc. Your PDF are defined as objects of SignalFunctions and BackgroundFunctions classes (see SignalFunctions.h and BackgroundFunctions.h) which are heritated from BaseFunction abstract mother class (BaseFunction.h). These daughter classes construct PDF from the functions defined in RooFit, as RooKeysPdf for example.

In the main script, fritSignal.cc, you create a signal PDF for each category of your analysis, i. e. if you are in the muon or electron channel. This is realized by the getCategoriesPdf function implemented in Functions.h.

Note: Usually, in the code, we will use these definitions:

category refers to the lepton channel (i. e. electron or muon) ;
type refers to "signal" or "background".

Finally, once your PDF are chosen for each category, the signal is fitted for the different category (using RooSimultaneous PDF).

Your work and results are saved in different ways and will be used in the next steps:

The number of found signal events, its error and the chi2 of the fit are stored in frit_efficiencies.json for each mass point, category and b-tag selection ;
The PDF and fit results are saved in a RooWorkSpace in `/frit/_workspace.root``;
Plots of the fit and the data are drawn and saved in frit directory.

How to use it

Start a new analysis

You will run the script on the datasets you produced in the previous step. The first thing you have to do is to create a symbolic link to these datasets (this will remove the date from the name of the directory and you will be able to use directly fritSignalForAllMasses*.py later):

Fit> cd ../Extractor2Dataset/datasets
datasets> ln -s <aDate> Latest
datasets> cd ../../Fit

You can decide to work on several analysis corresponding to different configurations. To start a new one, use startAnalysis.py script:

Fit> ./startAnalysis.py

Then, follow the instructions and fill the informations that are required (a name and a description for your analysis for example). Each of your analysis are identified by a unique ID which is then necessary for the fritSignal.cc script to know on which analysis it has to run on.

This automatically generates an analysis directory (if it doesn't exist already) in which a directory, whose name is the ID number, is created. Particulary, in this last directory, you can find the configuration files in the fit_configuration directory (pasted from the current analysis one). The work space and plots you will generate for your analysis will be then saved in the frit directory.

The informations concerning your analysis are stored in analysis.json which is automatically updated by the startAnalysis.py script.

To switch from an analysis to an other one, use setCurrentAnalysis.py and follow the instructions:

Fit> ./setCurrentAnalysis.py

Use the script

The configuration is read from the fit_configuration/frit_pdf.json by default, and this is how this file is construct:

{
  "muon": {
    "signal": {
      "name": "keyspdf",
      "parameters": "none"
    },
    "background": {
      "name": "exp",
      "parameters": "exp_mu"
    }
  },
  "electron": {
    "signal": {
      "name": "keyspdf",
      "parameters": "none"
    },
    "background": {
      "name": "exp",
      "parameters": "exp_e"
    }
  }
}

As it is shown, for each category, electron or muon, you have to specify:

name: the PDF name for both signal and background ;
parameters: the parameters associated to these PDF, used for their construction in RooFit.

The parameters are defined in fit_configuration/pdf_parameters.json where you have to specify a beginning value, a low bound and a high bound for each parameter. For example, for the crystal ball parameters for both electron and muon channels, you can have:

{
  "crystalball_mu": {
    "mean": ["%mass%", "%low_bound%", "%high_bound%"],
    "sigma": [55., 50., 200.],
    "alpha": [1.13, 0.1, 10.],
    "n": [1., 0.5, 300.]
  },
  "crystalball_e": {
    "mean": ["%mass%", "%low_bound%", "%high_bound%"],
    "sigma": [55., 50., 200.],
    "alpha": [2.8, 0.1, 10.],
    "n": [1., 0.5, 300.]
  }
}

Note: Currently, the signal is fitted with a non-analytic PDF (RooKeysPdf), according to an "only signal" hypothesis. Previously, it was fitted according to a "signal+background" hypothesis, combining a crystal ball PDF for the signal and a decreasing exponential to modelized the reconstruction errors. So in the first case, a background PDF is not needed but the script will crash if none is given in the configuration file. Finally, the code removes this background PDF from the global one.

This is the usage of fritSignal script:

./fritSignal  {--input-list <string>|-i <string>} [--save-workspace]
                 [--config-file <string>] --b-tag <int> -m <integer> [--pdf
                 <string>] [--pileup <string>] [--jer <string>] [--jec
                 <string>] [--] [--version] [-h]


Where: 

   --input-list <string>
     (OR required)  A text file containing a list of input files
         -- OR --
   -i <string>,  --input-file <string>
     (OR required)  The input file


   --save-workspace
     Save the workspace for redoing plot after

   --config-file <string>
     Fit configuration file

   --b-tag <int>
     (required)  Number of b-tagged jets

   -m <integer>,  --mass <integer>
     (required)  Zprime mass

   --pdf <string>
     Run the frit for this specific pdf syst.

   --pileup <string>
     Run the frit for this specific pileup syst.

   --jer <string>
     Run the frit for this specific jer.

   --jec <string>
     Run the frit for this specific jec.

   --,  --ignore_rest
     Ignores the rest of the labeled arguments following this flag.

   --version
     Displays version information and exits.

   -h,  --help
     Displays usage information and exits.

For example, to fit a Z' with a mass of 750 GeV, with one b-tagged jets and the default configuration, the command will be:

./fritSignal -m 750 --b-tag 1 -i ../Extractor2Dataset/MTT_Zprime_750_Narrow_2012_08Nov_merged.root

You have to redo this operation for each mass point and for each b-tagged jets number (1 and 2). Fortunately, a script is doing it automatically for you:

To run it on narrow signal datasets:

Fit> /fritSignalForAllMassesNarrow.py

To run it on large signal datasets:

Fit> /fritSignalForAllMassesNarrow.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MttTools how to

Presentation

How to use it

For data

For MC

Run the script

Step 2: Extractor2Histo

Presentation

How to use it

For data

For MC

Run the script

Step 3: PlotIt

Presentation

How to use it

Step 4: Extractor2Dataset

Presentation

How to use it

Step 5: FritSignal

Presentation

How to use it

Start a new analysis

Use the script

Clone this wiki locally