feat: prompt dataMC checks (#95)

* feat: prompt dataMC checks - wf: adding QCD workflow & modify triggered changes - scripts: improve fetch script with filenames only - plot: add validation plotter - doc/util/runner: add prompt_dataMC checks with dummy campaigns * fix: dtype * added scripts to run workflows * change ttsemilep to all plots * linting with black * fix: dump processed * feat: clean up MET * Add submit script instructions in README.md --------- Co-authored-by: uttiyasarkar <[email protected]>
cms-btv-pog · Apr 12, 2024 · 14e654f · 14e654f
1 parent 09fbbc5
commit 14e654f
Show file tree

Hide file tree

Showing 37 changed files with 734 additions and 28,117 deletions.
diff --git a/.github/workflows/BTA_workflow.yml b/.github/workflows/BTA_workflow.yml
@@ -5,14 +5,18 @@ on:
     branches: [ master ]
     paths:
     - 'src/BTVNanoCommissioning/workflows/*BTA*'
-    - 'src/BTVNanoCommissioning/helpers/*'
+    - 'src/BTVNanoCommissioning/helpers/update_branch.py'
+    - 'src/BTVNanoCommissioning/helpers/func.py'
+    - 'src/BTVNanoCommissioning/helpers/definitions.py'
     - 'src/BTVNanoCommissioning/utils/*'
     - '.github/workflows/BTA_workflow.yml'
   pull_request_target:
     branches: [ master ]
     paths:
     - 'src/BTVNanoCommissioning/workflows/*BTA*'
-    - 'src/BTVNanoCommissioning/helpers/*'
+    - 'src/BTVNanoCommissioning/helpers/update_branch.py'
+    - 'src/BTVNanoCommissioning/helpers/func.py'
+    - 'src/BTVNanoCommissioning/helpers/definitions.py'
     - 'src/BTVNanoCommissioning/utils/*'
     - '.github/workflows/BTA_workflow.yml'
   workflow_dispatch:

diff --git a/.github/workflows/QCD_workflow.yml b/.github/workflows/QCD_workflow.yml
@@ -0,0 +1,135 @@
+name: QCD Workflow
+
+on:
+  push:
+    branches: [ master ]
+    paths:
+    - 'src/BTVNanoCommissioning/workflows/*QCD*'
+    - 'src/BTVNanoCommissioning/helpers/update_branch.py'
+    - 'src/BTVNanoCommissioning/helpers/func.py'
+    - 'src/BTVNanoCommissioning/helpers/definitions.py'
+    - 'src/BTVNanoCommissioning/utils/*'
+    - '.github/workflows/QCD_workflow.yml'
+
+  pull_request_target:
+    branches: [ master ]
+    paths:
+    - 'src/BTVNanoCommissioning/workflows/*QCD*'
+    - 'src/BTVNanoCommissioning/helpers/update_branch.py'
+    - 'src/BTVNanoCommissioning/helpers/func.py'
+    - 'src/BTVNanoCommissioning/helpers/definitions.py'
+    - 'src/BTVNanoCommissioning/utils/*'
+    - '.github/workflows/QCD_workflow.yml'
+
+  workflow_dispatch:
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    if: ${{ !contains(github.event.head_commit.message, '[skip ci]') }}
+    strategy:
+      max-parallel: 4
+      matrix:
+        python-version: ["3.10"]
+
+    defaults:
+      run:
+        shell: "bash -l {0}"
+
+    steps:
+    - uses: actions/checkout@v2
+    - name: update submodules
+      env:
+        SSHKEY: ${{ secrets.GIT_CERN_SSH_PRIVATE }}
+      run: |
+        mkdir  $HOME/.ssh
+        echo "$SSHKEY" > $HOME/.ssh/id_rsa
+        ls -lrt $HOME/.ssh
+        chmod 600  $HOME/.ssh/id_rsa
+        echo "HOST *" > ~/.ssh/config
+        echo "StrictHostKeyChecking no" >> ~/.ssh/config
+        git submodule update --init --recursive 
+    - uses: cvmfs-contrib/github-action-cvmfs@v2
+      with:
+        cvmfs_repositories: 'grid.cern.ch'
+
+    - name: Set conda environment
+      uses: conda-incubator/setup-miniconda@v2
+      with:
+        python-version: ${{ matrix.python-version }}
+        miniforge-variant: Mambaforge
+        channels: conda-forge,defaults
+        channel-priority: true
+        activate-environment: btv_coffea
+        environment-file: test_env.yml
+        auto-activate-base: false
+
+    - name: Verify environment
+      run: |
+        conda info
+        conda env list
+        conda list
+    
+    - name: Set up proxy
+      # https://awesome-workshop.github.io/gitlab-cms/03-vomsproxy/index.html
+      # continue-on-error: true
+      env:
+        # To genereate secrets use (strip all \n)
+        # base64 -i ~/.globus/usercert.pem | awk NF=NF RS= OFS=
+        # base64 -i ~/.globus/userkey.pem | awk NF=NF RS= OFS=
+        # Cross check roundtrip by adding ``| base64 -d `` and see if same as input
+        GRID_USERKEY: ${{ secrets.GRID_USERKEY }}
+        GRID_USERCERT: ${{ secrets.GRID_USERCERT }}
+        # Read automatically by voms-proxy-init
+        X509_VOMS_DIR: /cvmfs/grid.cern.ch/etc/grid-security/vomsdir/
+        X509_VOMSES: /cvmfs/grid.cern.ch/etc/grid-security/vomses/
+        X509_DEFAULT_USER_CERT: $HOME/.globus/usercert.pem
+        X509_DEFAULT_USER_KEY: $HOME/.globus/userkey.pem
+      run: |
+        mkdir $HOME/.globus
+        printf $GRID_USERKEY | base64 -d > $HOME/.globus/userkey.pem
+        printf $GRID_USERCERT | base64 -d > $HOME/.globus/usercert.pem
+        # DEBUG: dump decoded cert, cert is public, but don't dump key!
+        # base64 -i $HOME/.globus/usercert.pem
+        chmod 400 $HOME/.globus/userkey.pem
+        openssl rand -out $HOME/.rnd  -hex 256
+        printf "${{secrets.GRID_PASSWORD}}" | voms-proxy-init --voms cms --vomses ${X509_VOMSES} --debug --pwstdin
+        chmod 755 /usr/share/miniconda3/envs/btv_coffea/etc/grid-security/certificates
+    - name: Test xrootd
+      run: |
+        xrdcp root://eoscms.cern.ch//eos/cms/store/group/phys_btag/nano-commissioning/test_w_dj.root .
+      
+    - name: Install Repo
+      run: |
+        pip install -e .
+    
+
+    - name: QCD workflows with correctionlib
+      run: |
+        string=$(git log -1 --pretty=format:'%s')
+        if [[ $string == *"ci:skip array"* ]]; then
+        opts=$(echo "$opts" | sed 's/--isArray //g')
+        fi
+        if [[ $string == *"ci:skip syst"* ]]; then
+            opts=$(echo "$opts" | sed 's/--isSyst all//g')
+        elif [[ $string == *"ci:JERC_split"* ]]; then
+            opts=$(echo "$opts" | sed 's/--isSyst all/--isSyst JERC_split/g')
+        elif [[ $string == *"ci:weight_only"* ]]; then
+            opts=$(echo "$opts" | sed 's/--isSyst all/--isSyst weight_only/g') 
+        fi
+        python runner.py --workflow QCD_sf --json metadata/test_bta_run3.json  --executor iterative  $opts
+    
+    # - name: QCD mu workflows with correctionlib
+    #   run: |
+    #     message=$(git log -1 --pretty=format:'%s')
+    #     if [[ $string == *"ci:skip array"* ]]; then
+    #     opts=$(echo "$opts" | sed 's/--isArray //g')
+    #     fi
+    #     if [[ $string == *"ci:skip syst"* ]]; then
+    #         opts=$(echo "$opts" | sed 's/--isSyst all//g')
+    #     elif [[ $string == *"ci:JERC_split"* ]]; then
+    #         opts=$(echo "$opts" | sed 's/--isSyst all/--isSyst JERC_split/g')
+    #     elif [[ $string == *"ci:weight_only"* ]]; then
+    #         opts=$(echo "$opts" | sed 's/--isSyst all/--isSyst weight_only/g') 
+    #     fi
+    #     python runner.py --workflow QCD_mu --json metadata/test_bta_run3.json --executor iterative  --overwrite $opts
diff --git a/.github/workflows/ctag_DY_workflow.yml b/.github/workflows/ctag_DY_workflow.yml
@@ -5,14 +5,18 @@ on:
     branches: [ master ]
     paths:
     - 'src/BTVNanoCommissioning/workflows/ctag_*DY*'
-    - 'src/BTVNanoCommissioning/helpers/*'
+    - 'src/BTVNanoCommissioning/helpers/update_branch.py'
+    - 'src/BTVNanoCommissioning/helpers/func.py'
+    - 'src/BTVNanoCommissioning/helpers/definitions.py'
     - 'src/BTVNanoCommissioning/utils/*'
     - '.github/workflows/ctag_DY_workflow.yml'
   pull_request_target:
     branches: [ master ]
     paths:
     - 'src/BTVNanoCommissioning/workflows/ctag_*DY*'
-    - 'src/BTVNanoCommissioning/helpers/*'
+    - 'src/BTVNanoCommissioning/helpers/update_branch.py'
+    - 'src/BTVNanoCommissioning/helpers/func.py'
+    - 'src/BTVNanoCommissioning/helpers/definitions.py'
     - 'src/BTVNanoCommissioning/utils/*'
     - '.github/workflows/ctag_DY_workflow.yml'
   workflow_dispatch:

diff --git a/.github/workflows/ctag_Wc_workflow.yml b/.github/workflows/ctag_Wc_workflow.yml
@@ -5,15 +5,19 @@ on:
     branches: [ master ]
     paths:
     - 'src/BTVNanoCommissioning/workflows/ctag_*Wc*'
-    - 'src/BTVNanoCommissioning/helpers/*'
+    - 'src/BTVNanoCommissioning/helpers/update_branch.py'
+    - 'src/BTVNanoCommissioning/helpers/func.py'
+    - 'src/BTVNanoCommissioning/helpers/definitions.py'
     - 'src/BTVNanoCommissioning/utils/*'
     - '.github/workflows/ctag_Wc_workflow.yml'
 
   pull_request_target:
     branches: [ master ]
     paths:
     - 'src/BTVNanoCommissioning/workflows/ctag_*Wc*'
-    - 'src/BTVNanoCommissioning/helpers/*'
+    - 'src/BTVNanoCommissioning/helpers/update_branch.py'
+    - 'src/BTVNanoCommissioning/helpers/func.py'
+    - 'src/BTVNanoCommissioning/helpers/definitions.py'
     - 'src/BTVNanoCommissioning/utils/*'
     - '.github/workflows/ctag_Wc_workflow.yml'
 

diff --git a/.github/workflows/ctag_ttbar_workflow.yml b/.github/workflows/ctag_ttbar_workflow.yml
@@ -5,14 +5,18 @@ on:
     branches: [ master ]
     paths:
     - 'src/BTVNanoCommissioning/workflows/ctag*tt*'
-    - 'src/BTVNanoCommissioning/helpers/*'
+    - 'src/BTVNanoCommissioning/helpers/update_branch.py'
+    - 'src/BTVNanoCommissioning/helpers/func.py'
+    - 'src/BTVNanoCommissioning/helpers/definitions.py'
     - 'src/BTVNanoCommissioning/utils/*'
     - '.github/workflows/ctag_ttbar_workflow.yml'
   pull_request_target:
     branches: [ master ]
     paths:
     - 'src/BTVNanoCommissioning/workflows/ctag*tt*'
-    - 'src/BTVNanoCommissioning/helpers/*'
+    - 'src/BTVNanoCommissioning/helpers/update_branch.py'
+    - 'src/BTVNanoCommissioning/helpers/func.py'
+    - 'src/BTVNanoCommissioning/helpers/definitions.py'
     - 'src/BTVNanoCommissioning/utils/*'
     - '.github/workflows/ctag_ttbar_workflow.yml'
   workflow_dispatch:

diff --git a/.github/workflows/ttbar_workflow.yml b/.github/workflows/ttbar_workflow.yml
@@ -5,14 +5,18 @@ on:
     branches: [ master ]
     paths:
     - 'src/BTVNanoCommissioning/workflows/tt*'
-    - 'src/BTVNanoCommissioning/helpers/*'
+    - 'src/BTVNanoCommissioning/helpers/update_branch.py'
+    - 'src/BTVNanoCommissioning/helpers/func.py'
+    - 'src/BTVNanoCommissioning/helpers/definitions.py'
     - 'src/BTVNanoCommissioning/utils/*'
     - '.github/workflows/ttbar_workflow.yml'
   pull_request_target:
     branches: [ master ]
     paths:
     - 'src/BTVNanoCommissioning/workflows/tt*'
-    - 'src/BTVNanoCommissioning/helpers/*'
+    - 'src/BTVNanoCommissioning/helpers/update_branch.py'
+    - 'src/BTVNanoCommissioning/helpers/func.py'
+    - 'src/BTVNanoCommissioning/helpers/definitions.py'
     - 'src/BTVNanoCommissioning/utils/*'
     - '.github/workflows/ttbar_workflow.yml'
   workflow_dispatch:

diff --git a/README.md b/README.md
@@ -6,6 +6,7 @@
 [![ctag DY+jets Workflow](https://github.com/cms-btv-pog/BTVNanoCommissioning/actions/workflows/ctag_DY_workflow.yml/badge.svg)](https://github.com/cms-btv-pog/BTVNanoCommissioning/actions/workflows/ctag_DY_workflow.yml)
 [![ctag W+c Workflow](https://github.com/cms-btv-pog/BTVNanoCommissioning/actions/workflows/ctag_Wc_workflow.yml/badge.svg)](https://github.com/cms-btv-pog/BTVNanoCommissioning/actions/workflows/ctag_Wc_workflow.yml)
 [![BTA Workflow](https://github.com/cms-btv-pog/BTVNanoCommissioning/actions/workflows/BTA_workflow.yml/badge.svg)](https://github.com/cms-btv-pog/BTVNanoCommissioning/actions/workflows/BTA_workflow.yml)
+[![QCD Workflow](https://github.com/cms-btv-pog/BTVNanoCommissioning/actions/workflows/QCD_workflow.yml/badge.svg)](https://github.com/cms-btv-pog/BTVNanoCommissioning/actions/workflows/QCD_workflow.yml)
 [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
 
 Repository for Commissioning studies in the BTV POG based on (custom) nanoAOD samples
@@ -47,6 +48,15 @@ pip install -e .[dev] # for developer
 ### Other installation options for coffea
 See https://coffeateam.github.io/coffea/installation.html
 
+## Quick launch of all tasks
+
+Now you can use various shell scripts to directly launch the runner scripts with predefined scaleouts. You can modify and customize the scripts inside the ```scripts/submit``` directory according to your needs. Each script takes arguments from ```arguments.txt``` directory, that has 4 inputs i.e. - ```Campaign name```, ```year```, ```executor``` and ```luminosity```. To launch any workflow, for example W+c - 
+```
+./ctag_wc.sh arguments.txt
+```
+Additional scripts are provided to make a directory structure that creates directories locally and copies them in the remote BTV eos area [https://btvweb.web.cern.ch/Commissioning/dataMC/](https://btvweb.web.cern.ch/Commissioning/dataMC/). 
+Finally plots can be directly monitored in the webpage.
+
 ## Structure
 
 Each workflow can be a separate "processor" file, creating the mapping from NanoAOD to
@@ -73,7 +83,7 @@ More options for `runner.py`
                         (default: dummy_samples.json)
   --year YEAR           Year
   --campaign CAMPAIGN   Dataset campaign, change the corresponding correction
-                        files{ "Rereco17_94X","Winter22Run3","Summer23","Summer23BPix","Summer22","Summer22EE","2018_UL","2017_UL","2016preVFP_UL","2016postVFP_UL"}
+                        files{ "Rereco17_94X","Winter22Run3","Summer23","Summer23BPix","Summer22","Summer22EE","2018_UL","2017_UL","2016preVFP_UL","2016postVFP_UL","prompt_dataMC"}
   --isSyst              Run with systematics, all, weight_only(no JERC uncertainties included),JERC_split, None(not extract)
   --isArray             Output root files
   --noHist              Not save histogram coffea files
@@ -203,7 +213,6 @@ python runner.py --workflow valid --json metadata/$json file
 </p>
 </details>
 
-
 #### BTA - BTagAnalyzer Ntuple producer
 
 Based on Congqiao's [development](notebooks/BTA_array_producer.ipynb) to produce BTA ntuples based on PFNano.
@@ -344,13 +353,16 @@ The script provided by Pablo to resubmit failure jobs in `script/missingFiles.py
 
 ## Make the dataset json files
 
-Use `fetch.py` in folder `scripts/` to obtain your samples json files. You can create `$input_list` ,which can be a list of datasets taken from CMS DAS , and create the json contains `dataset_name:[filelist]`. One can specify the local path in that input list for samples not published in CMS DAS.
+Use `fetch.py` in folder `scripts/` to obtain your samples json files. You can create `$input_list` ,which can be a list of datasets taken from CMS DAS or names of dataset(need to specify campaigns explicity), and create the json contains `dataset_name:[filelist]`. One can specify the local path in that input list for samples not published in CMS DAS.
 `$output_json_name$` is the name of your output samples json file.
 
 The `--whitelist_sites, --blacklist_sites` are considered for fetch dataset if multiple sites are available
 
+
 ```
-## File publish in DAS
+## File publish in DAS, input MC file name list, specified --from_dataset and add campaign info, if more than one campaign found, would ask for specify explicity
+python scripts/fetch.py -i $MC_FILE_LIST -o ${output_json_name} --from_dataset --campaign Run3Summer23BPixNanoAODv12
+## File publish in DAS, input DAS path
 python fetch.py --input ${input_DAS_list} --output ${output_json_name} (--xrd {prefix_forsite})
 
 ## Not publish case, specify site by --xrd prefix
@@ -493,10 +505,24 @@ python -m BTVNanoCommissioning.utils.compile_jec ${campaign} jec_compiled
 e.g. python -m BTVNanoCommissioning.utils.compile_jec Summer23 jec_compiled
 ```
 
+## Prompt data/MC checks and validation
+
+### Prompt data/MC checks (prompt_dataMC campaign, WIP)
+
+To quickly check the data/MC quickly, run part data/MC files, no SFs/JEC are applied, only the lumimasks.
+
+1. Get the file list from DAS, use the `scripts/fetch.py` scripts to obtain the jsons
+2. Replace the lumimask name in prompt_dataMC in `AK4_parameters.py` , you can do `sed -i 's/$LUMIMASK_DATAMC/xxx.json/g`
+3. Run through the dataset to obtained the `coffea` files
+4. Dump the lumi information via `dump_processed.py`, then use `brilcalc` to get the dedicated luminosity info
+5. Obtained data MC plots
+
+### Validation workflow
+
 
 ## Plotting code
 
-- data/MC comparisons
+### data/MC comparisons
 :exclamation_mark: If using wildcard for input, do not forget the quoatation marks! (see 2nd example below)
 
 You can specify `-v all` to plot all the variables in the `coffea` file, or use wildcard options (e.g. `-v "*DeepJet*"` for the input variables containing `DeepJet`)
@@ -544,7 +570,7 @@ options:
 </details>
 </p>
 
-- data/data, MC/MC comparisons
+### data/data, MC/MC comparisons
 
 You can specify `-v all` to plot all the variables in the `coffea` file, or use wildcard options (e.g. `-v "*DeepJet*"` for the input variables containing `DeepJet`)
 :exclamation_mark: If using wildcard for input, do not forget the quoatation marks! (see 2nd example below)
@@ -591,6 +617,15 @@ options:
 </details>
 </p>
 
+
+### ROCs & efficiency plots
+
+Extract the ROCs for different tagger and efficiencies from validation workflow
+
+```python
+python scripts/validation_plot.py -i  $INPUT_COFFEA -v $VERSION
+```
+
 ## Store histograms from coffea file
 
 Use `scripts/make_template.py` to dump 1D/2D histogram from `.coffea` to `TH1D/TH2D` with hist. MC histograms can be reweighted to according to luminosity value given via `--lumi`. You can also merge several files