Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion in sequence when neither "--data" nor "--mc" is used #34443

Closed
srimanob opened this issue Jul 12, 2021 · 12 comments
Closed

Confusion in sequence when neither "--data" nor "--mc" is used #34443

srimanob opened this issue Jul 12, 2021 · 12 comments

Comments

@srimanob
Copy link
Contributor

We started to spot the issue when neither "--data" nor "--mc" is used. CMSSW normally guess using
https://github.com/srimanob/cmssw/blob/master/Configuration/Applications/python/cmsDriverOptions.py#L214-L235

The issue happened when it guesses as a data workflow, however, nothing (e.g. isData) is set after that. When we make a workflow with, i.e.
cmsDriver.py RECO --conditions 106X_dataRun2_v35 --customise Configuration/DataProcessing/Utils.addMonitoring --datatier NANOAOD --era Run2_2018,run2_nanoAOD_106Xv2 --eventcontent NANOEDMAOD --filein placeholder.root --fileout file:ReReco-Run2018C-JetHT-UL2018_MiniAODv2_NanoAODv9_pilot-00001.root --nThreads 2 --no_exec --python_filename ReReco-Run2018C-JetHT-UL2018_MiniAODv2_NanoAODv9_pilot-00001_0_cfg.py --scenario pp --step NANO
it will guess it is data workflow, but pick up MC customization. The issue is not only on NanoAOD, Mini will be the same.

The PR #34302 introduces a fix.

However, when we start to set isData properly, we start to see a failure in IB test. The issue come from DQM validation sequence, as reported in #34438
In short, the validation sequence try to run
cmsDriver.py step3 --conditions auto:run2_data -s VALIDATION --process DUMMY --eventcontent DQM --datatier DQMIO --customise_commands 'process.Tracer = cms.Service("Tracer")' --filein file:/eos/cms/store/data/Run2018A/EGamma/RAW/v1/000/315/489/00000/004D960A-EA4C-E811-A908-FA163ED1F481.root -n 10 --python_filename cmssw_cfg.py --no_exec
which is data workflow. However, MC sequence is picked including pileup replay. Config can be dumped, but it does not make sense. The dump step will fail when "--data" or "isData" is used.

It seems the test of DQM sequence should be reviewed first to avoid broken in data workflow. Then we can apply #34302 again, to make mini/nano data workflow properly when it comes from CMSSW guess.

@cmsbuild
Copy link
Contributor

A new Issue was created by @srimanob Phat Srimanobhas.

@Dr15Jones, @perrotta, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@qliphy
Copy link
Contributor

qliphy commented Jul 12, 2021

assign dqm,xpog

@cmsbuild
Copy link
Contributor

New categories assigned: dqm,xpog

@jfernan2,@kmaeshima,@fgolf,@mariadalfonso,@rvenditti,@andrius-k,@gouskos,@ErnestaP,@ahmad3213 you have been requested to review this Pull request/Issue and eventually sign? Thanks

@jfernan2
Copy link
Contributor

@srimanob I have been looking into this but my knowledge of the PdmV configs from cmsDriver is limited.
Nevertheless, enabling your PR #34302 and doing a print before the processing of the 10 sequences crashing in:
https://github.com/cms-sw/cmssw/blob/master/DQMOffline/Configuration/scripts/cmsswSequenceInfo.py#L530

after scram b runtests_TestDQMOfflineConfiguration_110

I get:

Config file cmssw_cfg.py created
[Sequence(seqname='@miniAODValidation', step='VALIDATION', era='Phase2C11I13T25M9', scenario='', mc=False, data=False, fast=False), Sequence(seqname='@miniAODValidation', step='VALIDATION', era='Phase2C11I13T26M9', scenario='', mc=False, data=False, fast=False), Sequence(seqname='@miniAODValidation', step='VALIDATION', era='Phase2C9', scenario='', mc=False, data=False, fast=False), Sequence(seqname='@miniAODValidation', step='VALIDATION', era='Run2_2016', scenario='', mc=False, data=False, fast=False), Sequence(seqname='@miniAODValidation', step='VALIDATION', era='Run2_2016', scenario='pp', mc=True, data=False, fast=False), Sequence(seqname='@miniAODValidation', step='VALIDATION', era='Run2_2016,run2_miniAOD_80XLegacy', scenario='pp', mc=True, data=False, fast=False), Sequence(seqname='@miniAODValidation', step='VALIDATION', era='Run2_2016_HIPM', scenario='', mc=False, data=False, fast=False), Sequence(seqname='@miniAODValidation', step='VALIDATION', era='Run2_2016_HIPM', scenario='pp', mc=True, data=False, fast=False), Sequence(seqname='@miniAODValidation', step='VALIDATION', era='Run2_2016_trackingLowPU', scenario='', mc=False, data=False, fast=False), Sequence(seqname='@miniAODValidation', step='VALIDATION', era='Run2_2017', scenario='', mc=False, data=False, fast=False)]
Analyzing 10 seqs...

So, it looks like there are several sequences with mc=False and data=False at the same time...Is this expected?

@srimanob
Copy link
Contributor Author

srimanob commented Jul 13, 2021

Hi @jfernan2
I don't have an idea why it is not set properly. Do you know how these workflows have been selected?

However, the clues I have include

  1. The input and GT used in DQM sequence validation is just a dummy. Since we don't work on physics result with it.
  2. With validation sequence only, CMSSW will guess it is a data workflow. However, it will pick up MC sequence because isData is not set after the guess.
  3. miniaodValidation will not work with data. Current things work because of a bug in (2).

The fix I propose in #34452 include

  1. Assign properly "isData" when it is data workflow
  2. Removing mixing replay when the workflow is data.
  3. Add miniaodValdiation and Phase2 as a "temporary" check for MC. This I can disable when I know how to access the test workflow of DQM, and define properly what the workflow should be.

@jfernan2
Copy link
Contributor

Your solution seems fine to me, thanks!

I would not consider them selected workflows but module/sequences coming from a full runTheMatrix inspection. They are part of the UnitTests under DQMOffline/Configuration/test which just grabs any DQM,Validation module/sequence to check its validity:

https://github.com/cms-sw/cmssw/blob/master/DQMOffline/Configuration/test/runtests.sh#L23

Legacy modules are stripped and then the full list of sequences is splitted in chuncks to run them
https://github.com/cms-sw/cmssw/blob/master/DQMOffline/Configuration/test/BuildFile.xml

@srimanob
Copy link
Contributor Author

srimanob commented Jul 13, 2021

@cms-sw/dqm-l2

Could you please confirm the statement in
#34452 (comment) ?

For step:

  • VALIDATION => MC
  • VALIDATION + DQM => MC
  • DQM => Data or MC. In this case, running in Data mode with MC file should be OK.
    ?

Thanks in advance.

@jfernan2
Copy link
Contributor

+1
Solved by #3445

@mariadalfonso
Copy link
Contributor

mariadalfonso commented May 18, 2022

type jetmet

@mariadalfonso
Copy link
Contributor

This explicit --data and --mc is only needed by the MET sequence.

Suggest to review the METcode such that like ANY other object doesn't rely on external switches

@vlimant
Copy link
Contributor

vlimant commented Oct 10, 2022

+1
we took note of the MET configuration that needs cleaning

@cmsbuild
Copy link
Contributor

This issue is fully signed and ready to be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants