Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix when neither "data" nor "MC" is specified in the cmsDriver. #34452

Merged
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Configuration/Applications/python/ConfigBuilder.py
Original file line number Diff line number Diff line change
Expand Up @@ -1849,7 +1849,7 @@ def NFI(index):
if self._options.restoreRNDSeeds==False and not self._options.restoreRNDSeeds==True:
self._options.restoreRNDSeeds=True

if not 'DIGI' in self.stepMap and not self._options.fast:
if not 'DIGI' in self.stepMap and not self._options.isData and not self._options.fast:
self.executeAndRemember("process.mix.playback = True")
self.executeAndRemember("process.mix.digitizers = cms.PSet()")
self.executeAndRemember("for a in process.aliases: delattr(process, a)")
Expand Down
7 changes: 7 additions & 0 deletions Configuration/Applications/python/cmsDriverOptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,10 +229,17 @@ def OptionsFromItems(items):
options.isMC=True
if 'SIM' in options.datatier:
options.isMC=True
# This is a temporary solution to make validation-only sequence work in IB. One should define workflow properly.
if options.era and 'Phase2' in options.era:
options.isMC=True
if 'miniAODValidation' in options.step:
Copy link
Contributor

@perrotta perrotta Jul 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was counter-intuitive to me that all miniAODValidation must be considered as "MC": of course, we produce miniAOD also from real data.
Then I realized that the miniAODValidationSequence only includes a genParticlesValidation: therefore, the fix here makes sense... but the name of the sequence probably not!
@hatakeyamak @jfernan2 : is there any (hystorical) reason for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never seen that we use miniAODValidationSequence with data before. In UL, PdmV uses DQM:@rerecoCommon or DQM:@standardDQM+@ExtraHLT

I agree that this makes confusion on what it for MC/Data or both

[1, 2016 UL]
https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/5ad31c9be7cfbf89c62a938d7fa94dfa/configFile

[2, B-Parking]
https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/be67f91f945866144130b3b61e2e2994/configFile

Copy link
Contributor

@makortel makortel Jul 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was counter-intuitive to me that all miniAODValidation must be considered as "MC"

May I ask why? (I'm mostly just curious) I mean, we tend to use the word "validation" for MC and "DQM" for data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was counter-intuitive to me that all miniAODValidation must be considered as "MC"

May I ask why? (I'm mostly just curious) I mean, we tend to use the word "validation" for MC and "DQM" for data.

Ah, ok: that was the key I missed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@makortel @perrotta
If that is the key, the issue may solve easily. However, I see that we also use them together, i.e.
VALIDATION:@phase2Validation+@miniAODValidation,DQM:@phase2+@miniAODDQM in (*)

Can I simply say that if we have VALIDATION step, it will always be MC workflow?
And if we have only DQM, it will be data workflow?

(*)
https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/ab0d87d02b073cc5e7ae3f16e1350bbb/configFile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I assume we should be OK if DQM-only is used with MC because CMSSW should treat it like data in this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, we tend to use the word "validation" for MC and "DQM" for data.

I believe this is not true in a practical and general way, as you can see from:

https://github.com/cms-sw/cmssw/blob/master/Configuration/PyReleaseValidation/python/relval_steps.py

where there is a mixture of VALIDATION,DQM and VALIDATION/DQM over data and MC too

What it is true is that the DQM Unit Tests performed on:
https://github.com/cms-sw/cmssw/blob/master/DQMOffline/Configuration/test/runtests.sh#L23
are using a single (real) data input file, to simplify them, so we could switch these tests to isData=True to make them run with Phat's PR

In these tests, since a module with a given label can have different configurations depending on options such as Era,
Scenario, Data vs. MC etc, if multiple configurations for the same name were found, they are listed separately
here and denoted using subscripts. That is why the changes in this PR are failing somehow.

I don't know if the changes in this PR have implications in the standard workflows though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, we tend to use the word "validation" for MC and "DQM" for data.

I believe this is not true in a practical and general way, as you can see from:

Right, sorry for being imprecise. What I really meant (and what would have been sufficient) is that what we tend to call "validation" in the context of "standard sequences" requires MC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What it is true is that the DQM Unit Tests performed on:
https://github.com/cms-sw/cmssw/blob/master/DQMOffline/Configuration/test/runtests.sh#L23
are using a single (real) data input file, to simplify them, so we could switch these tests to isData=True to make them run with Phat's PR

This may not true. Somehow the test also picks DQM-only for Phase-2 workflow which does not exist (or I miss it somehow). CMSSW already guess that it is data-workflow, but still, fail. See in https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7e50b2/16755/unitTests/failed.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input file for the tests is always the same, and it is an EGamma 2018 RAW data file downloaded at the beginning of the script to avoid delays from AAA:
--infile file:///data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_12_0_X_2021-07-12-2300/tmp/004D960A-EA4C-E811-A908-FA163ED1F481.root

options.isMC=True
#
if options.isMC:
print('We have determined that this is simulation (if not, rerun cmsDriver.py with --data)')
else:
print('We have determined that this is real data (if not, rerun cmsDriver.py with --mc)')
options.isData=True

if options.profile:
if options.profile and options.prefix:
Expand Down