Developers overriding production location for retrieving conditions in configurations #27393

davidlange6 · 2019-06-28T10:03:10Z

Breaks centrally determined location for retrieving conditions (which of course can change even if it rarely does). All of these should rely on the global tag to determine from where the conditions are to be taken (or they should not be loaded into production workflows)

cmssw/L1Trigger/L1TMuon/python/simGmtStage2Digis_cfi.py

Line 23 in 02d4198

CondDB.connect = cms.string("frontier://FrontierProd/CMS_CONDITIONS")

cmssw/RecoTauTag/RecoTau/python/tools/runTauIdMVA.py

Line 41 in 82c4a52

self.process.loadRecoTauTagMVAsFromPrepDB.connect = cms.string(conditionDB)

cmssw/L1Trigger/L1TCalorimeter/python/simDigis_cff.py

Line 49 in 8587c76

CondDB.connect = cms.string("frontier://FrontierProd/CMS_CONDITIONS")

cmsbuild · 2019-06-28T10:03:39Z

A new Issue was created by @davidlange6 David Lange.

@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

davidlange6 · 2019-06-28T10:03:57Z

assign l1,db,reco

cmsbuild · 2019-06-28T10:04:02Z

New categories assigned: db,l1

@ggovi,@benkrikler,@rekovic you have been requested to review this Pull request/Issue and eventually sign? Thanks

davidlange6 · 2019-06-28T10:05:01Z

assign reconstruction

cmsbuild · 2019-06-28T10:05:28Z

New categories assigned: reconstruction

@slava77,@perrotta you have been requested to review this Pull request/Issue and eventually sign? Thanks

ggovi · 2019-06-28T10:24:05Z

My personal opinion is that a cfg file containing a Condition Customization ( defined with one or more toGet statements ) should be NEVER invoked ( directly or included ) in a production workflow, since:

it overrides the mapping record/tag provided by the official Global Tag, referencing some Tag that could stay outside the control ( in terms of testing and validation )
it potentially breaks reproducibility

For this reason, I think that the first measure required is to find out the reason of the concerned customisation, and remove them from the production workflows.
This will potentially imply:

to replace some tag in existing GTs (?)
to create additional GTs (?)

I think AlCa people should be involved in this review procedure

slava77 · 2019-06-28T12:09:49Z

for cmssw/RecoTauTag/RecoTau/python/tools/runTauIdMVA.py
@mbluj @steggema @swozniewski please comment and perhaps follow up with a resolution

fabiocos · 2019-06-28T12:17:35Z

for the L1Trigger instances: @rekovic could you please address this issue?

ggovi · 2019-06-28T12:38:53Z

Can we add alca in this loop?

kpedro88 · 2019-06-28T12:40:53Z

assign alca

cmsbuild · 2019-06-28T12:41:15Z

New categories assigned: alca

@christopheralanwest,@franzoni,@tlampen,@pohsun,@tocheng you have been requested to review this Pull request/Issue and eventually sign? Thanks

mbluj · 2019-06-28T13:52:20Z

Hello,
comments on

cmssw/RecoTauTag/RecoTau/python/tools/runTauIdMVA.py

Line 41 in 82c4a52

self.process.loadRecoTauTagMVAsFromPrepDB.connect = cms.string(conditionDB)

TauID payloads, i.e. BDT training files and pt-dependent WP definitions, are mapped via python configuration file RecoTauTag/Configuration/python/loadRecoTauTagMVAsFromPrepDB_cfi.py rather than via a GlobalTag. In fact the payloads do not define what one could call "conditions", but define a version of a given tau identification algorithm.
This particular tool (runTauIdMVA.py) is used to add new tauIDs to process object and then potentially to a given workflow - PAT/MiniAOD or post-PAT/MiniAOD. Until 10_6_X the tool has been used only by final users to produce their analysis ntuples on top of MiniAOD. Since 10_6_X the tool is used to add one particular tauID to MiniAOD (namely DeepTauID).
As it is handy to maintain definition of tauIDs in one place it is considered to use the tool also to define upgraded tauIDs for nanoAOD.
Concerning retrieving conditions in #L41 of the tool - it is inactive. It can be activated only if an nonempty connection address is defined in #L31. The reason of such implementation is to make test of payloads from preparational DB simpler.

slava77 · 2019-07-01T13:12:48Z

+1

the only case affecting reco is clarified in #27393 (comment)
this looks like a reasonable use case for private tests, which does not affect the production setup.

davidlange6 · 2019-07-02T13:23:22Z

these are the payloads that were needed to run digi+reco without frontier - it does contradict the conclusion of @slava77 and @mbluj

https://gitlab.cern.ch/hep-benchmarks/hep-workloads/blob/cae5bd13ac1aaf3df01f224db642f2c00df26d93/cms/reco/cms-reco/generate_GlobalTag.sh

slava77 · 2019-07-02T13:31:12Z

generate_GlobalTag.sh

please elaborate what's in this file.
Thank you.

davidlange6 · 2019-07-02T13:34:46Z

all the payloads not in the GT that without which a digi-reco-miniaod workflow will not run

davidlange6 · 2019-07-02T13:36:06Z

which I think is not to say (if or if not) they are actually read and used..but rather that our applications depend on them being present to function - perhaps due to purely technical reasons

davidlange6 · 2019-07-02T13:36:25Z

(sorry, didn't mean to close this)

slava77 · 2019-07-02T13:43:31Z

I think that the origin is in

cmssw/RecoTauTag/Configuration/python/loadRecoTauTagMVAsFromPrepDB_cfi.py

Line 5 in 82c4a52

    
           CondDBTauConnection = CondDB.clone( connect = cms.string( 'frontier://FrontierProd/CMS_CONDITIONS' ) )

not in the

cmssw/RecoTauTag/RecoTau/python/tools/runTauIdMVA.py

Line 41 in 82c4a52

self.process.loadRecoTauTagMVAsFromPrepDB.connect = cms.string(conditionDB)

katilp · 2024-01-26T16:48:25Z

If these changes are expected to affect the output we would like to have them as soon as possible for testing and eventually reprocessing of the example PFNano datasets that will go to the open data release.
Changing the release has a big impact on the work to be done as well, but we will need to do much of that work for a new GT in any case.

mmusich · 2024-01-26T16:52:23Z

For my edification why is CMS releasing open data related to a custom workflow (requiring unsupported conditions)?

katilp · 2024-01-26T17:01:12Z

For my edification why is CMS releasing open data related to a custom wotkflow (requiring unsupported conditions)?

Producing NanoAOD enriched with the PF candidates is of major interest to the open data user community. That is why we are providing that example. We do not use unsupported conditions on purpose. We are more than happy to use supported conditions, please let us know how to modify the workflow or point us to the relevant documentation. Thank you, your help will be appreciated🙂!

mbluj · 2024-01-26T17:53:38Z

@katilp Executing your cmsDriver command I got the following error:

ImportError: No module named PFNano.pfnano_cff

And indeed there is not PhysicsTools/PFNano/ package in 10_6_30 and customization cff therein. Could you please provide a full installation recipe for copy and paste to reproduce the issue?

In parallel I checked that this configuration without the customization works without problems. This highly probably mean that the issue is caused by the customization.

I will back to it at the beginning of the next week, but I am not sure if I will have time on Monday due to other duties.

katilp · 2024-01-26T17:55:10Z

For my edification why is CMS releasing open data related to a custom wotkflow (requiring unsupported conditions)?

Producing NanoAOD enriched with the PF candidates is of major interest to the open data user community. That is why we are providing that example. We do not use unsupported conditions on purpose. We are more than happy to use supported conditions, please let us know how to modify the workflow or point us to the relevant documentation. Thank you, your help will be appreciated🙂!

Also please note that this error appears without any customization and has nothing to do with open data, other than open data preparations helping CMS to improve the codebase ❤️

mmusich · 2024-01-26T17:57:09Z

Also please note that this error appears without any customization and has nothing to do with open data, other than open data preparations helping CMS to improve the codebase

this seems in contradiction with #27393 (comment)

mbluj · 2024-01-26T17:59:42Z

For my edification why is CMS releasing open data related to a custom wotkflow (requiring unsupported conditions)?

Producing NanoAOD enriched with the PF candidates is of major interest to the open data user community. That is why we are providing that example. We do not use unsupported conditions on purpose. We are more than happy to use supported conditions, please let us know how to modify the workflow or point us to the relevant documentation. Thank you, your help will be appreciated🙂!

Also please note that this error appears without any customization and has nothing to do with open data, other than open data preparations helping CMS to improve the codebase ❤️

As written in the comment above: I was not able to reproduce the issue w/o the customization, i.e. executing the following

cmsrel CMSSW_10_6_30
cd CMSSW_10_6_30/src
cmsenv
cmsDriver.py data_2016UL_OpenData --data --eventcontent NANOAODSIM --datatier NANOAODSIM --step NANO --conditions 106X_dataRun2_v37   --era Run2_2016,run2_nanoAOD_106Xv2 --customise_commands="process.add_(cms.Service('InitRootHandlers', EnableIMT = cms.untracked.bool(False)))" --nThreads 4 -n 100 --filein /store/data/Run2016H/JetHT/MINIAOD/UL2016_MiniAODv2-v2/130000/676E37D2-044C-D346-92D9-A127A55FD279.root --fileout file:nano_data2016_nopf.root  --no_exec
voms-proxy-init --voms cms
cmsRun data_2016UL_OpenData_NANO.py

The cmsDriver command with customization does not work for me.

katilp · 2024-01-26T17:59:49Z

@katilp Executing your cmsDriver command I got the following error:
ImportError: No module named PFNano.pfnano_cff
And indeed there is not PhysicsTools/PFNano/ package in 10_6_30 and customization cff therein. Could you please provide a full installation recipe for copy and paste to reproduce the issue?

In parallel I checked that this configuration without the customization works without problems. This highly probably mean that the issue is caused by the customization.

I will back to it at the beginning of the next week, but I am not sure if I will have time on Monday due to other duties.

Thank you!

cmsrel CMSSW_10_6_30
cd CMSSW_10_6_30_src

git clone https://github.com/cms-opendata-analyses/PFNanoProducerTool.git PhysicsTools/PFNano
scram b
cd PhysicsTools/PFNano/
cmsenv

cmsRun <config_below>

For the sake of brevity, removing the customization (updated: bringing back two default lines for customization):

import FWCore.ParameterSet.Config as cms

from Configuration.Eras.Era_Run2_2016_cff import Run2_2016
from Configuration.Eras.Modifier_run2_nanoAOD_106Xv2_cff import run2_nanoAOD_106Xv2

process = cms.Process('NANO',Run2_2016,run2_nanoAOD_106Xv2)

# import of standard configurations
process.load('Configuration.StandardSequences.Services_cff')
process.load('SimGeneral.HepPDTESSource.pythiapdt_cfi')
process.load('FWCore.MessageService.MessageLogger_cfi')
process.load('Configuration.EventContent.EventContent_cff')
process.load('Configuration.StandardSequences.GeometryRecoDB_cff')
process.load('Configuration.StandardSequences.MagneticField_AutoFromDBCurrent_cff')
process.load('PhysicsTools.NanoAOD.nano_cff')
process.load('Configuration.StandardSequences.EndOfProcess_cff')
process.load('Configuration.StandardSequences.FrontierConditions_GlobalTag_cff')

process.maxEvents = cms.untracked.PSet(
    input = cms.untracked.int32(100)
)

# Input source
process.source = cms.Source("PoolSource",
    fileNames = cms.untracked.vstring('root://eospublic.cern.ch//eos/opendata/cms/Run2016G/JetHT/MINIAOD/UL2016_MiniAODv2-v2/130000/35017A26-8C9D-204D-92B6-3ABFBBD4ADF3.root'),
    secondaryFileNames = cms.untracked.vstring()
)

process.options = cms.untracked.PSet(

)

# Production Info
process.configurationMetadata = cms.untracked.PSet(
    annotation = cms.untracked.string('nano_data_2016_UL nevts:100'),
    name = cms.untracked.string('Applications'),
    version = cms.untracked.string('$Revision: 1.19 $')
)

# Output definition

process.NANOAODSIMoutput = cms.OutputModule("NanoAODOutputModule",
    compressionAlgorithm = cms.untracked.string('LZMA'),
    compressionLevel = cms.untracked.int32(9),
    dataset = cms.untracked.PSet(
        dataTier = cms.untracked.string('NANOAODSIM'),
        filterName = cms.untracked.string('')
    ),
    fileName = cms.untracked.string('file:nano_data2016.root'),
    outputCommands = process.NANOAODSIMEventContent.outputCommands
)

# Additional output definition

# Other statements
from Configuration.AlCa.GlobalTag import GlobalTag
#process.GlobalTag = GlobalTag(process.GlobalTag, '106X_dataRun2_v37', '')
process.GlobalTag.connect = cms.string('sqlite_file:/cvmfs/cms-opendata-conddb.cern.ch/106X_dataRun2_v37.db')
process.GlobalTag.globaltag = '106X_dataRun2_v37'

# Path and EndPath definitions
process.nanoAOD_step = cms.Path(process.nanoSequence)
process.endjob_step = cms.EndPath(process.endOfProcess)
process.NANOAODSIMoutput_step = cms.EndPath(process.NANOAODSIMoutput)

# Schedule definition
process.schedule = cms.Schedule(process.nanoAOD_step,process.endjob_step,process.NANOAODSIMoutput_step)
from PhysicsTools.PatAlgos.tools.helpers import associatePatAlgosToolsTask
associatePatAlgosToolsTask(process)

#Setup FWK for multithreaded
process.options.numberOfThreads=cms.untracked.uint32(4)
process.options.numberOfStreams=cms.untracked.uint32(0)
process.options.numberOfConcurrentLuminosityBlocks=cms.untracked.uint32(1)

# customisation of the process.
# Automatic addition of the customisation function from PhysicsTools.NanoAOD.nano_cff
from PhysicsTools.NanoAOD.nano_cff import nanoAOD_customizeData

#call to customisation function nanoAOD_customizeData imported from PhysicsTools.NanoAOD.nano_cff
process = nanoAOD_customizeData(process)

# End of customisation functions

# Customisation from command line

process.add_(cms.Service('InitRootHandlers', EnableIMT = cms.untracked.bool(False)))
# Add early deletion of temporary data products to reduce peak memory need
from Configuration.StandardSequences.earlyDeleteSettings_cff import customiseEarlyDelete
process = customiseEarlyDelete(process)
# End adding early deletion

Note that it will not fail if a frontier connection is available, only when not (to come to the original question)

mbluj · 2024-01-26T18:02:02Z

Ah, OK. How can I remove connection to frontier w/o loosing connection to GT?

mbluj · 2024-01-26T18:03:16Z

Ah, OK. How can I remove connection to frontier w/o loosing connection to GT?

I see it in your cfg file. I will test it later, but now I must run.

mbluj · 2024-01-26T18:13:00Z

With configuration copied from above which has connection to conditions in a sqlite file, giving python dump like this:

>>> process.GlobalTag
cms.ESSource("PoolDBESSource",
    DBParameters = cms.PSet(
        authenticationPath = cms.untracked.string(''),
        authenticationSystem = cms.untracked.int32(0),
        messageLevel = cms.untracked.int32(0),
        security = cms.untracked.string('')
    ),
    DumpStat = cms.untracked.bool(False),
    ReconnectEachRun = cms.untracked.bool(False),
    RefreshAlways = cms.untracked.bool(False),
    RefreshEachRun = cms.untracked.bool(False),
    RefreshOpenIOVs = cms.untracked.bool(False),
    connect = cms.string('sqlite_file:/cvmfs/cms-opendata-conddb.cern.ch/106X_dataRun2_v37.db'),
    globaltag = cms.string('106X_dataRun2_v37'),
    pfnPostfix = cms.untracked.string(''),
    pfnPrefix = cms.untracked.string(''),
    snapshotTime = cms.string(''),
    toGet = cms.VPSet()
)

it still works for me...

mmusich · 2024-01-26T18:13:15Z

for the sake of honesty,

For the sake of brevity, removing the customization

this is not exactly removing the customization, you are still checking out a package that is not available in release via:

git clone https://github.com/cms-opendata-analyses/PFNanoProducerTool.git PhysicsTools/PFNano

in a self-contained CMSSW_10_6_30 (which is what is normally accepted as centrally supported) the configuration above doesn't run.

katilp · 2024-01-26T18:24:30Z

Ah, OK. How can I remove connection to frontier w/o loosing connection to GT?

I see it in your cfg file. I will test it later, but now I must run.

No, connecting to the /cvmfs area for condition data is not enough, is does not cut the frontier connection. That's why this goes unobserved and it only fails when there is no frontier connection at all. i.e CMS open data VM, or in the CMS open data docker container if one explicitly removes the frontier connection, see https://cms-opendata-releaseguide.docs.cern.ch/computing_environment/containers/#testing-without-frontier-connection

katilp · 2024-01-26T18:35:32Z

for the sake of honesty,

For the sake of brevity, removing the customization

this is not exactly removing the customization, you are still checking out a package that is not available in release via:

git clone https://github.com/cms-opendata-analyses/PFNanoProducerTool.git PhysicsTools/PFNano

in a self-contained CMSSW_10_6_30 (which is what is normally accepted as centrally supported) the configuration above doesn't run.

Yes, of course, this is what we provide for open data users. We do not point them to anything centrally supported but provide them examples of how they can use CMS open data. We believe that this an issue independent from the

With configuration copied from above which has connection to conditions in a sqlite file, giving python dump like this:

>>> process.GlobalTag
cms.ESSource("PoolDBESSource",
    DBParameters = cms.PSet(
        authenticationPath = cms.untracked.string(''),
        authenticationSystem = cms.untracked.int32(0),
        messageLevel = cms.untracked.int32(0),
        security = cms.untracked.string('')
    ),
    DumpStat = cms.untracked.bool(False),
    ReconnectEachRun = cms.untracked.bool(False),
    RefreshAlways = cms.untracked.bool(False),
    RefreshEachRun = cms.untracked.bool(False),
    RefreshOpenIOVs = cms.untracked.bool(False),
    connect = cms.string('sqlite_file:/cvmfs/cms-opendata-conddb.cern.ch/106X_dataRun2_v37.db'),
    globaltag = cms.string('106X_dataRun2_v37'),
    pfnPostfix = cms.untracked.string(''),
    pfnPrefix = cms.untracked.string(''),
    snapshotTime = cms.string(''),
    toGet = cms.VPSet()
)

it still works for me...

Yes, it will work if you do not cut the frontier connection. This is the issue. It goes unobserved.

katilp · 2024-01-26T18:43:51Z

for the sake of honesty,

For the sake of brevity, removing the customization

this is not exactly removing the customization, you are still checking out a package that is not available in release via:
git clone https://github.com/cms-opendata-analyses/PFNanoProducerTool.git PhysicsTools/PFNano
in a self-contained CMSSW_10_6_30 (which is what is normally accepted as centrally supported) the configuration above doesn't run.

Yes, of course, this is what we provide for open data users. We do not point them to anything centrally supported but provide them examples of how they can use CMS open data. We believe that this an issue independent from the
With configuration copied from above which has connection to conditions in a sqlite file, giving python dump like this:
>>> process.GlobalTag
cms.ESSource("PoolDBESSource",
    DBParameters = cms.PSet(
        authenticationPath = cms.untracked.string(''),
        authenticationSystem = cms.untracked.int32(0),
        messageLevel = cms.untracked.int32(0),
        security = cms.untracked.string('')
    ),
    DumpStat = cms.untracked.bool(False),
    ReconnectEachRun = cms.untracked.bool(False),
    RefreshAlways = cms.untracked.bool(False),
    RefreshEachRun = cms.untracked.bool(False),
    RefreshOpenIOVs = cms.untracked.bool(False),
    connect = cms.string('sqlite_file:/cvmfs/cms-opendata-conddb.cern.ch/106X_dataRun2_v37.db'),
    globaltag = cms.string('106X_dataRun2_v37'),
    pfnPostfix = cms.untracked.string(''),
    pfnPrefix = cms.untracked.string(''),
    snapshotTime = cms.string(''),
    toGet = cms.VPSet()
)
it still works for me...
Yes, it will work if you do not cut the frontier connection. This is the issue. It goes unobserved.

If you cannot cut the frontier connection, take the RecoTauTag package, add prints to this file locally, and you will see it ending up there. And that's all what I'm trying to say.

mmusich · 2024-01-26T20:06:58Z

If you cannot cut the frontier connection, take the RecoTauTag package, add prints to this file locally, and you will see it ending up there. And that's all what I'm trying to say.

I can indeed make the process crash by short-circuiting this via this recipe:

cmsrel CMSSW_10_6_30
cd CMSSW_10_6_30/src
cmsenv
cmsDriver.py data_2016UL_OpenData --data --eventcontent NANOAODSIM --datatier NANOAODSIM --step NANO --conditions 106X_dataRun2_v37 --era Run2_2016,run2_nanoAOD_106Xv2 --customise_commands="process.add_(cms.Service('InitRootHandlers', EnableIMT = cms.untracked.bool(False)));delattr(process, 'loadRecoTauTagMVAsFromPrepDB')" --nThreads 4 -n 100 --filein /store/data/Run2016H/JetHT/MINIAOD/UL2016_MiniAODv2-v2/130000/676E37D2-044C-D346-92D9-A127A55FD279.root --fileout file:nano_data2016_nopf.root --no_exec
voms-proxy-init --voms cms
cmsRun data_2016UL_OpenData_NANO.py

which is independent from the PFNano customization.
The issue persists in CMSSW_14_0_0_pre2 so this issue is indeed not fully solved even in recent (pre-)releases.

hqucms · 2024-01-29T10:03:52Z

I did some more investigations for CMSSW_14_0_0_pre2 based on the recipe from @mmusich. It turns out after changing two things I can get it to work:

Removing these tasks from patTauMVAIDsTask. It seems that they are not used anyhow as the output NANO content remain unchanged after removing them. Maybe @mbluj can confirm if they are indeed unused?
Switch the GT from 106X_dataRun2_v37 to a more recent one (I just used auto:run2_data). It seems that some of MVA tags for boosted taus are only included since 113X GTs.

For the open NANO release, I suppose the easiest way is to gather a list of the tags needed but not in the GT, and just dump/add them into a sqlite file? I dumped a list of tags being loaded manually in the 106X workflow. Probably not all of them are strictly needed, but have all of them should make things work w/o connecting to Frontier.

tags.txt

vlimant · 2024-01-29T10:29:14Z

2. Switch the GT from 106X_dataRun2_v37 to a more recent one (I just used auto:run2_data). It seems that some of MVA tags for boosted taus are only included since 113X GTs.

what GT have you used instead of auto:run2_data ?

mbluj · 2024-01-29T11:02:27Z

I did some more investigations for CMSSW_14_0_0_pre2 based on the recipe from @mmusich. It turns out after changing two things I can get it to work:

Removing these tasks from patTauMVAIDsTask. It seems that they are not used anyhow as the output NANO content remain unchanged after removing them. Maybe @mbluj can confirm if they are indeed unused?

Switch the GT from 106X_dataRun2_v37 to a more recent one (I just used auto:run2_data). It seems that some of MVA tags for boosted taus are only included since 113X GTs.

For the open NANO release, I suppose the easiest way is to gather a list of the tags needed but not in the GT, and just dump/add them into a sqlite file? I dumped a list of tags being loaded manually in the 106X workflow. Probably not all of them are strictly needed, but have all of them should make things work w/o connecting to Frontier.

tags.txt

Thanks @hqucms!
I think you are right about the tasks to be removed. It is actually what I mentioned earlier here (or in other parallel thread) that whole content of taus_updatedMVAIds_cff.py should be reviewed (I suppose that this is not needed anymore). I plan to do it in next few days and prepare a PR to master (and backports if needed).
I also agree that adding missing payloads to GT (thanks for the list!) is the quickest fix for open-data workflows as it does not require new CMSSW release.

katilp · 2024-01-29T11:23:21Z

I did some more investigations for CMSSW_14_0_0_pre2 based on the recipe from @mmusich. It turns out after changing two things I can get it to work:

Removing these tasks from patTauMVAIDsTask. It seems that they are not used anyhow as the output NANO content remain unchanged after removing them. Maybe @mbluj can confirm if they are indeed unused?

Switch the GT from 106X_dataRun2_v37 to a more recent one (I just used auto:run2_data). It seems that some of MVA tags for boosted taus are only included since 113X GTs.

For the open NANO release, I suppose the easiest way is to gather a list of the tags needed but not in the GT, and just dump/add them into a sqlite file? I dumped a list of tags being loaded manually in the 106X workflow. Probably not all of them are strictly needed, but have all of them should make things work w/o connecting to Frontier.
tags.txt

Thanks @hqucms! I think you are right about the tasks to be removed. It is actually what I mentioned earlier here (or in other parallel thread) that whole content of taus_updatedMVAIds_cff.py should be reviewed (I suppose that this is not needed anymore). I plan to do it in next few days and prepare a PR to master (and backports if needed). I also agree that adding missing payloads to GT (thanks for the list!) is the quickest fix for open-data workflows as it does not require new CMSSW release.

Thanks! For open data, we would prefer the cleanest solution i.e. a new release with the fixes and a new GT if that can happen very shortly. It will be more work for us now for changes in already prepared material, but in the future, it will avoid patches and additional explanations in the CMS open data tutorials and guides. What would be the estimated timescale?

hqucms · 2024-01-29T11:43:23Z

Switch the GT from 106X_dataRun2_v37 to a more recent one (I just used auto:run2_data). It seems that some of MVA tags for boosted taus are only included since 113X GTs.

what GT have you used instead of auto:run2_data ?

@vlimant auto:run2_data points to 133X_dataRun2_v2 in CMSSW_14_0_0_pre2. It seems that the tags missing in 106X_dataRun2_v37 are introduced in ~113X (e.g., https://cms-conddb.cern.ch/cmsDbBrowser/search/Prod/RecoTauTag_antiElectronMVA6v3_noeveto_gbr_NoEleMatch_woGwoGSF_BL).

hqucms · 2024-01-29T11:48:44Z

I did some more investigations for CMSSW_14_0_0_pre2 based on the recipe from @mmusich. It turns out after changing two things I can get it to work:

Removing these tasks from patTauMVAIDsTask. It seems that they are not used anyhow as the output NANO content remain unchanged after removing them. Maybe @mbluj can confirm if they are indeed unused?

Switch the GT from 106X_dataRun2_v37 to a more recent one (I just used auto:run2_data). It seems that some of MVA tags for boosted taus are only included since 113X GTs.

For the open NANO release, I suppose the easiest way is to gather a list of the tags needed but not in the GT, and just dump/add them into a sqlite file? I dumped a list of tags being loaded manually in the 106X workflow. Probably not all of them are strictly needed, but have all of them should make things work w/o connecting to Frontier.
tags.txt

Thanks @hqucms! I think you are right about the tasks to be removed. It is actually what I mentioned earlier here (or in other parallel thread) that whole content of taus_updatedMVAIds_cff.py should be reviewed (I suppose that this is not needed anymore). I plan to do it in next few days and prepare a PR to master (and backports if needed). I also agree that adding missing payloads to GT (thanks for the list!) is the quickest fix for open-data workflows as it does not require new CMSSW release.

Thanks! For open data, we would prefer the cleanest solution i.e. a new release with the fixes and a new GT if that can happen very shortly. It will be more work for us now for changes in already prepared material, but in the future, it will avoid patches and additional explanations in the CMS open data tutorials and guides. What would be the estimated timescale?

I think if we create a dedicated sqlite or make a new GT to include the missing tags, then no change is needed for the release. And since the open data will be using a sqlite anyhow, it might be much easier to just add the missing tags to the sqlite, rather than making a new GT. Maybe the Alca/DB group can comment on this?

katilp · 2024-01-29T12:34:46Z

I did some more investigations for CMSSW_14_0_0_pre2 based on the recipe from @mmusich. It turns out after changing two things I can get it to work:

Removing these tasks from patTauMVAIDsTask. It seems that they are not used anyhow as the output NANO content remain unchanged after removing them. Maybe @mbluj can confirm if they are indeed unused?

Switch the GT from 106X_dataRun2_v37 to a more recent one (I just used auto:run2_data). It seems that some of MVA tags for boosted taus are only included since 113X GTs.

For the open NANO release, I suppose the easiest way is to gather a list of the tags needed but not in the GT, and just dump/add them into a sqlite file? I dumped a list of tags being loaded manually in the 106X workflow. Probably not all of them are strictly needed, but have all of them should make things work w/o connecting to Frontier.
tags.txt

Thanks @hqucms! I think you are right about the tasks to be removed. It is actually what I mentioned earlier here (or in other parallel thread) that whole content of taus_updatedMVAIds_cff.py should be reviewed (I suppose that this is not needed anymore). I plan to do it in next few days and prepare a PR to master (and backports if needed). I also agree that adding missing payloads to GT (thanks for the list!) is the quickest fix for open-data workflows as it does not require new CMSSW release.

Thanks! For open data, we would prefer the cleanest solution i.e. a new release with the fixes and a new GT if that can happen very shortly. It will be more work for us now for changes in already prepared material, but in the future, it will avoid patches and additional explanations in the CMS open data tutorials and guides. What would be the estimated timescale?

I think if we create a dedicated sqlite or make a new GT to include the missing tags, then no change is needed for the release. And since the open data will be using a sqlite anyhow, it might be much easier to just add the missing tags to the sqlite, rather than making a new GT. Maybe the Alca/DB group can comment on this?

I think they commented in https://cms-talk.web.cern.ch/t/condition-database-access-outside-of-gt-for-nano-production/33715/5
In any case, a change is needed in the code either to remove the tasks or remove the frontier connection, and from the open data point of view we would prefer it clean. Open data will be using this release for years to come.

mbluj · 2024-02-01T11:17:01Z

Hello, sorry for delay in answering, but I was taken by other commitments. So, to summarize what needs to be done:

Removal of tauIDs with payloads not in GT from official workflows:
it is quite straightforward in master, but requires more work in UL release series (10_6) where nothing in this direction has been performed as far as I remember.
I expect that cleaning in the UL/10_6 releases will anyway require update of GT as I suppose that corresponding GT do not contain even minimal required set of needed payloads - to be checked.
What should be done with intermediate release series (>10_6 & <14_0)? Do we want backport of the cleaning to all of them? I would like to avoid it if not strictly necessary.

About timescale: I have other things on the plate (e.g. some L1 stuff for 2024 & phase-2), but I can reorder this if necessary. I suppose that cleaning master will take 1-2 days, backport to 10_6 a few additional days. What concerns GT update, I have not experience with it, so I have to either learn how to do or someone else should do the job - any help will be appreciated.

perrotta · 2024-02-01T11:49:44Z

Thank you @mbluj

If I understand it correctly we should:

Remove tauIDs with payloads not in GT from the master
Add the missing payloads to the GTs for UL/10_6 (AlCa can do it)

If you remove the tauIDs also from UL/10_6 then no update of the GTs is needed, but as it will probably require modifying in a non trivial way a closed release this should be probably better avoided.

For the intermediate releases I would check one by one: probably we could avoid acting on all them, but it depends on what's intended to do with them.

mbluj · 2024-02-01T12:37:07Z

Thank you @mbluj

If I understand it correctly we should:

Remove tauIDs with payloads not in GT from the master

Correct. As far as I understand the problem it touches only NanoAOD workflows.

Add the missing payloads to the GTs for UL/10_6 (AlCa can do it)
Yes, but the number of payloads to add is potentially bigger comparing to what is in GT for master and other release series.

If you remove the tauIDs also from UL/10_6 then no update of the GTs is needed, but as it will probably require modifying in a non trivial way a closed release this should be probably better avoided.

It should be checked. In principle a "full cleaning" can require changes in AOD/RECO, miniAOD and NanoAOD sequences, but I suppose that the changes will not affect data content at any of datatiers. The only effect can be removal of era-dependent modifications for compatibility with old (pre-UL) samples. Anyway, if I understand it correctly the idea is to update only NanoAOD sequences as it is not expected to produce new UL-like samples other than NanoAOD for OpenData purposes, right?

For the intermediate releases I would check one by one: probably we could avoid acting on all them, but it depends on what's intended to do with them.

OK. Changes in intermediate releases newer than 11_3 (if I am correct) will be similar to those in master while for older similar to those for 10_6. But, even trivial backporting and testing (sometimes creating GT) to a number of release series is already some additional burden.

jmhogan · 2024-02-26T16:24:39Z

Pinging this thread along with #43797 (@mbluj)

In the Open Data context, DPOA's preferred solution is cleaning these IDs from UL/10_6 so that a new release can be used without a new (bigger) GT.

Nano sequences are very likely to be the most used in Open Data, but we do provide instructions on how users can produce their own MC, which follows the full sequence. It would be ideal to clean this out fully. But the Nano sequences have the highest priority. We can test where in the full production chain it fails if needed.

vlimant · 2024-05-13T07:20:24Z

please close

done with #44685 until further notice

cmsbuild added the pending-assignment label Jun 28, 2019

cmsbuild added db-pending l1-pending pending-signatures and removed pending-assignment labels Jun 28, 2019

cmsbuild added the reconstruction-pending label Jun 28, 2019

cmsbuild added the alca-pending label Jun 28, 2019

cmsbuild added reconstruction-approved and removed reconstruction-pending labels Jul 1, 2019

davidlange6 closed this as completed Jul 2, 2019

davidlange6 reopened this Jul 2, 2019

vlimant mentioned this issue Jan 29, 2024

NANO (and MINI) pulling conditions from outside GT #43797

Closed

cmsbuild closed this as completed May 13, 2024

Developers overriding production location for retrieving conditions in configurations #27393

Developers overriding production location for retrieving conditions in configurations #27393

Comments

davidlange6 commented Jun 28, 2019

cmsbuild commented Jun 28, 2019 • edited Loading

davidlange6 commented Jun 28, 2019

cmsbuild commented Jun 28, 2019

davidlange6 commented Jun 28, 2019

cmsbuild commented Jun 28, 2019

ggovi commented Jun 28, 2019 • edited Loading

slava77 commented Jun 28, 2019

fabiocos commented Jun 28, 2019

ggovi commented Jun 28, 2019

kpedro88 commented Jun 28, 2019

cmsbuild commented Jun 28, 2019

mbluj commented Jun 28, 2019 • edited Loading

slava77 commented Jul 1, 2019

davidlange6 commented Jul 2, 2019

slava77 commented Jul 2, 2019

davidlange6 commented Jul 2, 2019

davidlange6 commented Jul 2, 2019

davidlange6 commented Jul 2, 2019

slava77 commented Jul 2, 2019

katilp commented Jan 26, 2024

mmusich commented Jan 26, 2024 • edited Loading

katilp commented Jan 26, 2024

mbluj commented Jan 26, 2024 • edited Loading

katilp commented Jan 26, 2024

mmusich commented Jan 26, 2024

mbluj commented Jan 26, 2024

katilp commented Jan 26, 2024 • edited Loading

mbluj commented Jan 26, 2024

mbluj commented Jan 26, 2024

mbluj commented Jan 26, 2024 • edited Loading

mmusich commented Jan 26, 2024

katilp commented Jan 26, 2024

katilp commented Jan 26, 2024

katilp commented Jan 26, 2024

mmusich commented Jan 26, 2024

hqucms commented Jan 29, 2024

vlimant commented Jan 29, 2024

mbluj commented Jan 29, 2024

katilp commented Jan 29, 2024

hqucms commented Jan 29, 2024

hqucms commented Jan 29, 2024

katilp commented Jan 29, 2024

mbluj commented Feb 1, 2024

perrotta commented Feb 1, 2024

mbluj commented Feb 1, 2024

jmhogan commented Feb 26, 2024

vlimant commented May 13, 2024

cmsbuild commented Jun 28, 2019 •

edited

Loading

ggovi commented Jun 28, 2019 •

edited

Loading

mbluj commented Jun 28, 2019 •

edited

Loading

mmusich commented Jan 26, 2024 •

edited

Loading

mbluj commented Jan 26, 2024 •

edited

Loading

katilp commented Jan 26, 2024 •

edited

Loading

mbluj commented Jan 26, 2024 •

edited

Loading