Skip to content

Commit

Permalink
feat: prompt dataMC checks (#95)
Browse files Browse the repository at this point in the history
* feat: prompt dataMC checks

- wf: adding QCD workflow & modify triggered changes
- scripts: improve fetch script with filenames only
- plot: add validation plotter
- doc/util/runner: add prompt_dataMC checks with dummy campaigns

* fix: dtype

* added scripts to run workflows

* change ttsemilep to all plots

* linting with black

* fix: dump processed

* feat: clean up MET

* Add submit script instructions in README.md

---------

Co-authored-by: uttiyasarkar <[email protected]>
  • Loading branch information
Ming-Yan and uttiyasarkar authored Apr 12, 2024
1 parent 09fbbc5 commit 14e654f
Show file tree
Hide file tree
Showing 37 changed files with 734 additions and 28,117 deletions.
8 changes: 6 additions & 2 deletions .github/workflows/BTA_workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,18 @@ on:
branches: [ master ]
paths:
- 'src/BTVNanoCommissioning/workflows/*BTA*'
- 'src/BTVNanoCommissioning/helpers/*'
- 'src/BTVNanoCommissioning/helpers/update_branch.py'
- 'src/BTVNanoCommissioning/helpers/func.py'
- 'src/BTVNanoCommissioning/helpers/definitions.py'
- 'src/BTVNanoCommissioning/utils/*'
- '.github/workflows/BTA_workflow.yml'
pull_request_target:
branches: [ master ]
paths:
- 'src/BTVNanoCommissioning/workflows/*BTA*'
- 'src/BTVNanoCommissioning/helpers/*'
- 'src/BTVNanoCommissioning/helpers/update_branch.py'
- 'src/BTVNanoCommissioning/helpers/func.py'
- 'src/BTVNanoCommissioning/helpers/definitions.py'
- 'src/BTVNanoCommissioning/utils/*'
- '.github/workflows/BTA_workflow.yml'
workflow_dispatch:
Expand Down
135 changes: 135 additions & 0 deletions .github/workflows/QCD_workflow.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
name: QCD Workflow

on:
push:
branches: [ master ]
paths:
- 'src/BTVNanoCommissioning/workflows/*QCD*'
- 'src/BTVNanoCommissioning/helpers/update_branch.py'
- 'src/BTVNanoCommissioning/helpers/func.py'
- 'src/BTVNanoCommissioning/helpers/definitions.py'
- 'src/BTVNanoCommissioning/utils/*'
- '.github/workflows/QCD_workflow.yml'

pull_request_target:
branches: [ master ]
paths:
- 'src/BTVNanoCommissioning/workflows/*QCD*'
- 'src/BTVNanoCommissioning/helpers/update_branch.py'
- 'src/BTVNanoCommissioning/helpers/func.py'
- 'src/BTVNanoCommissioning/helpers/definitions.py'
- 'src/BTVNanoCommissioning/utils/*'
- '.github/workflows/QCD_workflow.yml'

workflow_dispatch:

jobs:
build:
runs-on: ubuntu-latest
if: ${{ !contains(github.event.head_commit.message, '[skip ci]') }}
strategy:
max-parallel: 4
matrix:
python-version: ["3.10"]

defaults:
run:
shell: "bash -l {0}"

steps:
- uses: actions/checkout@v2
- name: update submodules
env:
SSHKEY: ${{ secrets.GIT_CERN_SSH_PRIVATE }}
run: |
mkdir $HOME/.ssh
echo "$SSHKEY" > $HOME/.ssh/id_rsa
ls -lrt $HOME/.ssh
chmod 600 $HOME/.ssh/id_rsa
echo "HOST *" > ~/.ssh/config
echo "StrictHostKeyChecking no" >> ~/.ssh/config
git submodule update --init --recursive
- uses: cvmfs-contrib/github-action-cvmfs@v2
with:
cvmfs_repositories: 'grid.cern.ch'

- name: Set conda environment
uses: conda-incubator/setup-miniconda@v2
with:
python-version: ${{ matrix.python-version }}
miniforge-variant: Mambaforge
channels: conda-forge,defaults
channel-priority: true
activate-environment: btv_coffea
environment-file: test_env.yml
auto-activate-base: false

- name: Verify environment
run: |
conda info
conda env list
conda list
- name: Set up proxy
# https://awesome-workshop.github.io/gitlab-cms/03-vomsproxy/index.html
# continue-on-error: true
env:
# To genereate secrets use (strip all \n)
# base64 -i ~/.globus/usercert.pem | awk NF=NF RS= OFS=
# base64 -i ~/.globus/userkey.pem | awk NF=NF RS= OFS=
# Cross check roundtrip by adding ``| base64 -d `` and see if same as input
GRID_USERKEY: ${{ secrets.GRID_USERKEY }}
GRID_USERCERT: ${{ secrets.GRID_USERCERT }}
# Read automatically by voms-proxy-init
X509_VOMS_DIR: /cvmfs/grid.cern.ch/etc/grid-security/vomsdir/
X509_VOMSES: /cvmfs/grid.cern.ch/etc/grid-security/vomses/
X509_DEFAULT_USER_CERT: $HOME/.globus/usercert.pem
X509_DEFAULT_USER_KEY: $HOME/.globus/userkey.pem
run: |
mkdir $HOME/.globus
printf $GRID_USERKEY | base64 -d > $HOME/.globus/userkey.pem
printf $GRID_USERCERT | base64 -d > $HOME/.globus/usercert.pem
# DEBUG: dump decoded cert, cert is public, but don't dump key!
# base64 -i $HOME/.globus/usercert.pem
chmod 400 $HOME/.globus/userkey.pem
openssl rand -out $HOME/.rnd -hex 256
printf "${{secrets.GRID_PASSWORD}}" | voms-proxy-init --voms cms --vomses ${X509_VOMSES} --debug --pwstdin
chmod 755 /usr/share/miniconda3/envs/btv_coffea/etc/grid-security/certificates
- name: Test xrootd
run: |
xrdcp root://eoscms.cern.ch//eos/cms/store/group/phys_btag/nano-commissioning/test_w_dj.root .
- name: Install Repo
run: |
pip install -e .

- name: QCD workflows with correctionlib
run: |
string=$(git log -1 --pretty=format:'%s')
if [[ $string == *"ci:skip array"* ]]; then
opts=$(echo "$opts" | sed 's/--isArray //g')
fi
if [[ $string == *"ci:skip syst"* ]]; then
opts=$(echo "$opts" | sed 's/--isSyst all//g')
elif [[ $string == *"ci:JERC_split"* ]]; then
opts=$(echo "$opts" | sed 's/--isSyst all/--isSyst JERC_split/g')
elif [[ $string == *"ci:weight_only"* ]]; then
opts=$(echo "$opts" | sed 's/--isSyst all/--isSyst weight_only/g')
fi
python runner.py --workflow QCD_sf --json metadata/test_bta_run3.json --executor iterative $opts
# - name: QCD mu workflows with correctionlib
# run: |
# message=$(git log -1 --pretty=format:'%s')
# if [[ $string == *"ci:skip array"* ]]; then
# opts=$(echo "$opts" | sed 's/--isArray //g')
# fi
# if [[ $string == *"ci:skip syst"* ]]; then
# opts=$(echo "$opts" | sed 's/--isSyst all//g')
# elif [[ $string == *"ci:JERC_split"* ]]; then
# opts=$(echo "$opts" | sed 's/--isSyst all/--isSyst JERC_split/g')
# elif [[ $string == *"ci:weight_only"* ]]; then
# opts=$(echo "$opts" | sed 's/--isSyst all/--isSyst weight_only/g')
# fi
# python runner.py --workflow QCD_mu --json metadata/test_bta_run3.json --executor iterative --overwrite $opts
8 changes: 6 additions & 2 deletions .github/workflows/ctag_DY_workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,18 @@ on:
branches: [ master ]
paths:
- 'src/BTVNanoCommissioning/workflows/ctag_*DY*'
- 'src/BTVNanoCommissioning/helpers/*'
- 'src/BTVNanoCommissioning/helpers/update_branch.py'
- 'src/BTVNanoCommissioning/helpers/func.py'
- 'src/BTVNanoCommissioning/helpers/definitions.py'
- 'src/BTVNanoCommissioning/utils/*'
- '.github/workflows/ctag_DY_workflow.yml'
pull_request_target:
branches: [ master ]
paths:
- 'src/BTVNanoCommissioning/workflows/ctag_*DY*'
- 'src/BTVNanoCommissioning/helpers/*'
- 'src/BTVNanoCommissioning/helpers/update_branch.py'
- 'src/BTVNanoCommissioning/helpers/func.py'
- 'src/BTVNanoCommissioning/helpers/definitions.py'
- 'src/BTVNanoCommissioning/utils/*'
- '.github/workflows/ctag_DY_workflow.yml'
workflow_dispatch:
Expand Down
8 changes: 6 additions & 2 deletions .github/workflows/ctag_Wc_workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,19 @@ on:
branches: [ master ]
paths:
- 'src/BTVNanoCommissioning/workflows/ctag_*Wc*'
- 'src/BTVNanoCommissioning/helpers/*'
- 'src/BTVNanoCommissioning/helpers/update_branch.py'
- 'src/BTVNanoCommissioning/helpers/func.py'
- 'src/BTVNanoCommissioning/helpers/definitions.py'
- 'src/BTVNanoCommissioning/utils/*'
- '.github/workflows/ctag_Wc_workflow.yml'

pull_request_target:
branches: [ master ]
paths:
- 'src/BTVNanoCommissioning/workflows/ctag_*Wc*'
- 'src/BTVNanoCommissioning/helpers/*'
- 'src/BTVNanoCommissioning/helpers/update_branch.py'
- 'src/BTVNanoCommissioning/helpers/func.py'
- 'src/BTVNanoCommissioning/helpers/definitions.py'
- 'src/BTVNanoCommissioning/utils/*'
- '.github/workflows/ctag_Wc_workflow.yml'

Expand Down
8 changes: 6 additions & 2 deletions .github/workflows/ctag_ttbar_workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,18 @@ on:
branches: [ master ]
paths:
- 'src/BTVNanoCommissioning/workflows/ctag*tt*'
- 'src/BTVNanoCommissioning/helpers/*'
- 'src/BTVNanoCommissioning/helpers/update_branch.py'
- 'src/BTVNanoCommissioning/helpers/func.py'
- 'src/BTVNanoCommissioning/helpers/definitions.py'
- 'src/BTVNanoCommissioning/utils/*'
- '.github/workflows/ctag_ttbar_workflow.yml'
pull_request_target:
branches: [ master ]
paths:
- 'src/BTVNanoCommissioning/workflows/ctag*tt*'
- 'src/BTVNanoCommissioning/helpers/*'
- 'src/BTVNanoCommissioning/helpers/update_branch.py'
- 'src/BTVNanoCommissioning/helpers/func.py'
- 'src/BTVNanoCommissioning/helpers/definitions.py'
- 'src/BTVNanoCommissioning/utils/*'
- '.github/workflows/ctag_ttbar_workflow.yml'
workflow_dispatch:
Expand Down
8 changes: 6 additions & 2 deletions .github/workflows/ttbar_workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,18 @@ on:
branches: [ master ]
paths:
- 'src/BTVNanoCommissioning/workflows/tt*'
- 'src/BTVNanoCommissioning/helpers/*'
- 'src/BTVNanoCommissioning/helpers/update_branch.py'
- 'src/BTVNanoCommissioning/helpers/func.py'
- 'src/BTVNanoCommissioning/helpers/definitions.py'
- 'src/BTVNanoCommissioning/utils/*'
- '.github/workflows/ttbar_workflow.yml'
pull_request_target:
branches: [ master ]
paths:
- 'src/BTVNanoCommissioning/workflows/tt*'
- 'src/BTVNanoCommissioning/helpers/*'
- 'src/BTVNanoCommissioning/helpers/update_branch.py'
- 'src/BTVNanoCommissioning/helpers/func.py'
- 'src/BTVNanoCommissioning/helpers/definitions.py'
- 'src/BTVNanoCommissioning/utils/*'
- '.github/workflows/ttbar_workflow.yml'
workflow_dispatch:
Expand Down
47 changes: 41 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[![ctag DY+jets Workflow](https://github.com/cms-btv-pog/BTVNanoCommissioning/actions/workflows/ctag_DY_workflow.yml/badge.svg)](https://github.com/cms-btv-pog/BTVNanoCommissioning/actions/workflows/ctag_DY_workflow.yml)
[![ctag W+c Workflow](https://github.com/cms-btv-pog/BTVNanoCommissioning/actions/workflows/ctag_Wc_workflow.yml/badge.svg)](https://github.com/cms-btv-pog/BTVNanoCommissioning/actions/workflows/ctag_Wc_workflow.yml)
[![BTA Workflow](https://github.com/cms-btv-pog/BTVNanoCommissioning/actions/workflows/BTA_workflow.yml/badge.svg)](https://github.com/cms-btv-pog/BTVNanoCommissioning/actions/workflows/BTA_workflow.yml)
[![QCD Workflow](https://github.com/cms-btv-pog/BTVNanoCommissioning/actions/workflows/QCD_workflow.yml/badge.svg)](https://github.com/cms-btv-pog/BTVNanoCommissioning/actions/workflows/QCD_workflow.yml)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

Repository for Commissioning studies in the BTV POG based on (custom) nanoAOD samples
Expand Down Expand Up @@ -47,6 +48,15 @@ pip install -e .[dev] # for developer
### Other installation options for coffea
See https://coffeateam.github.io/coffea/installation.html

## Quick launch of all tasks

Now you can use various shell scripts to directly launch the runner scripts with predefined scaleouts. You can modify and customize the scripts inside the ```scripts/submit``` directory according to your needs. Each script takes arguments from ```arguments.txt``` directory, that has 4 inputs i.e. - ```Campaign name```, ```year```, ```executor``` and ```luminosity```. To launch any workflow, for example W+c -
```
./ctag_wc.sh arguments.txt
```
Additional scripts are provided to make a directory structure that creates directories locally and copies them in the remote BTV eos area [https://btvweb.web.cern.ch/Commissioning/dataMC/](https://btvweb.web.cern.ch/Commissioning/dataMC/).
Finally plots can be directly monitored in the webpage.

## Structure

Each workflow can be a separate "processor" file, creating the mapping from NanoAOD to
Expand All @@ -73,7 +83,7 @@ More options for `runner.py`
(default: dummy_samples.json)
--year YEAR Year
--campaign CAMPAIGN Dataset campaign, change the corresponding correction
files{ "Rereco17_94X","Winter22Run3","Summer23","Summer23BPix","Summer22","Summer22EE","2018_UL","2017_UL","2016preVFP_UL","2016postVFP_UL"}
files{ "Rereco17_94X","Winter22Run3","Summer23","Summer23BPix","Summer22","Summer22EE","2018_UL","2017_UL","2016preVFP_UL","2016postVFP_UL","prompt_dataMC"}
--isSyst Run with systematics, all, weight_only(no JERC uncertainties included),JERC_split, None(not extract)
--isArray Output root files
--noHist Not save histogram coffea files
Expand Down Expand Up @@ -203,7 +213,6 @@ python runner.py --workflow valid --json metadata/$json file
</p>
</details>


#### BTA - BTagAnalyzer Ntuple producer

Based on Congqiao's [development](notebooks/BTA_array_producer.ipynb) to produce BTA ntuples based on PFNano.
Expand Down Expand Up @@ -344,13 +353,16 @@ The script provided by Pablo to resubmit failure jobs in `script/missingFiles.py

## Make the dataset json files

Use `fetch.py` in folder `scripts/` to obtain your samples json files. You can create `$input_list` ,which can be a list of datasets taken from CMS DAS , and create the json contains `dataset_name:[filelist]`. One can specify the local path in that input list for samples not published in CMS DAS.
Use `fetch.py` in folder `scripts/` to obtain your samples json files. You can create `$input_list` ,which can be a list of datasets taken from CMS DAS or names of dataset(need to specify campaigns explicity), and create the json contains `dataset_name:[filelist]`. One can specify the local path in that input list for samples not published in CMS DAS.
`$output_json_name$` is the name of your output samples json file.

The `--whitelist_sites, --blacklist_sites` are considered for fetch dataset if multiple sites are available


```
## File publish in DAS
## File publish in DAS, input MC file name list, specified --from_dataset and add campaign info, if more than one campaign found, would ask for specify explicity
python scripts/fetch.py -i $MC_FILE_LIST -o ${output_json_name} --from_dataset --campaign Run3Summer23BPixNanoAODv12
## File publish in DAS, input DAS path
python fetch.py --input ${input_DAS_list} --output ${output_json_name} (--xrd {prefix_forsite})
## Not publish case, specify site by --xrd prefix
Expand Down Expand Up @@ -493,10 +505,24 @@ python -m BTVNanoCommissioning.utils.compile_jec ${campaign} jec_compiled
e.g. python -m BTVNanoCommissioning.utils.compile_jec Summer23 jec_compiled
```

## Prompt data/MC checks and validation

### Prompt data/MC checks (prompt_dataMC campaign, WIP)

To quickly check the data/MC quickly, run part data/MC files, no SFs/JEC are applied, only the lumimasks.

1. Get the file list from DAS, use the `scripts/fetch.py` scripts to obtain the jsons
2. Replace the lumimask name in prompt_dataMC in `AK4_parameters.py` , you can do `sed -i 's/$LUMIMASK_DATAMC/xxx.json/g`
3. Run through the dataset to obtained the `coffea` files
4. Dump the lumi information via `dump_processed.py`, then use `brilcalc` to get the dedicated luminosity info
5. Obtained data MC plots

### Validation workflow


## Plotting code

- data/MC comparisons
### data/MC comparisons
:exclamation_mark: If using wildcard for input, do not forget the quoatation marks! (see 2nd example below)

You can specify `-v all` to plot all the variables in the `coffea` file, or use wildcard options (e.g. `-v "*DeepJet*"` for the input variables containing `DeepJet`)
Expand Down Expand Up @@ -544,7 +570,7 @@ options:
</details>
</p>

- data/data, MC/MC comparisons
### data/data, MC/MC comparisons

You can specify `-v all` to plot all the variables in the `coffea` file, or use wildcard options (e.g. `-v "*DeepJet*"` for the input variables containing `DeepJet`)
:exclamation_mark: If using wildcard for input, do not forget the quoatation marks! (see 2nd example below)
Expand Down Expand Up @@ -591,6 +617,15 @@ options:
</details>
</p>


### ROCs & efficiency plots

Extract the ROCs for different tagger and efficiencies from validation workflow

```python
python scripts/validation_plot.py -i $INPUT_COFFEA -v $VERSION
```

## Store histograms from coffea file

Use `scripts/make_template.py` to dump 1D/2D histogram from `.coffea` to `TH1D/TH2D` with hist. MC histograms can be reweighted to according to luminosity value given via `--lumi`. You can also merge several files
Expand Down
Loading

0 comments on commit 14e654f

Please sign in to comment.