Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Covariance Version Conflict between HLT Module and miniAOD Module #43861

Closed
brallmond opened this issue Feb 3, 2024 · 23 comments
Closed

Covariance Version Conflict between HLT Module and miniAOD Module #43861

brallmond opened this issue Feb 3, 2024 · 23 comments

Comments

@brallmond
Copy link
Contributor

By trying to use new HLT paths made in my user area to generate small MiniAOD file for DQM testing, I found what I believed was an issue between an HLT module and later MiniAOD modules. I sent a plea and description of my problem to Michal Bluj (observing this issue) and he describes the situation as the following:

  • DeepBoostedJetTagInfoProducer [1] builds internally pat::PackedCandidates from reco::PFCandidates when run with "use_hlt_features" on (see [1] lines 797-876). The PackedCandidates are built with a hardcoded version of track covariance packing (see l. 829 and calls of setTrackProperties(track, qual, cov_ver), e.g. in ll.845,847). This version is hardcoded to be equal to 0;
  • At offline, within the miniAOD sequence, there are three producers of pat::PackedCandidates (pluginName/producerName): PATPackedCandidateProducer/packedPFCandidates, PATLostTracks/lostTracks and PATTracksToPackedCandidates/hiPixelTracks. All of them have configurable version of track covariance, which is set to 1 for Run-3;
  • When the two settings are adjusted, i.e. either value hardcoded in DeepBoostedJetTagInfoProducer is set to 1 or value used by miniAOD modules set to 0 [2], the configuration works fine (in the sense that it does not throw an exception - tested with full file of ~1.5k of events).
  • What I do not understand well is the fact how and why building (transient) pat::PackedCandidates internally in DeepBoostedJetTagInfoProducer interferes with production of (persistent) pat::PackedCandidates by miniAOD modules. I checked that it also happens with a workflow w/o DeepBoostedJetTagInfoProducer with enabled "use_hlt_features" when covariance versions are different in different miniAOD modules (1 for packedPFCandidates and 0 for lostTracks). This can suggest that covariance packing schema is a kind of global, which in my opinion is a bug.
  • [1] RecoBTag/FeatureTools/plugins/DeepBoostedJetTagInfoProducer.cc

Here is a set of commands to reproduce the issue.

Clean cmssw release.

cmsrel CMSSW_14_0_0_pre2
cd CMSSW_14_0_0_pre2/src
cmsenv
git cms-addpkg HLTrigger/Configurations

Download user cff and place in configurations

hltGetConfiguration /users/ballmond/PNet/BaseV4GRun_w_PNet/V13 \
--globaltag auto:phase1_2024_realistic --mc --unprescale --max-events 10 \
--input /store/mc/Run3Winter24Digi/VBFHToTauTau_M125_TuneCP5_13p6TeV_powheg-pythia8/GEN-SIM-RAW/133X_mcRun3_2024_realistic_v8-v2/2560000/000af33b-2a68-4c78-b544-e1c0e3f4f576.root\
--cff > HLT_PNet_cff.py

cp HLT_PNet_cff.py HLTrigger/Configuration/python

Run the cmsDriver command to take a RAW file to MiniAOD level, using the user cff (It's possible you will need to download the file locally first to ensure that it can be opened promptly. It is from a recent VBFHTauTau dataset to ensure the PNet paths fire)

cmsDriver.py step2 -s RAW2DIGI,L1Reco,HLT:PNet,RECO,PAT \
--eventcontent MINIAOD \
--conditions auto:phase1_2024_realistic \
--era Run3 --geometry DB:Extended \
--process reMINI --filein /store/mc/Run3Winter24Digi/VBFHToTauTau_M125_TuneCP5_13p6TeV_powheg-pythia8/GEN-SIM-RAW/133X_mcRun3_2024_realistic_v8-v2/2560000/000af33b-2a68-4c78-b544-e1c0e3f4f576.root \
-n 5 --mc

The error only appears if a Pnet trigger fires, so be wary if you decide to use a different file for some reason. Here is the error:

Begin processing the 2nd record. Run 1, Event 844217, LumiSection 1284 on stream 0 at 03-Feb-2024 18:59:15.805 CET
----- Begin Fatal Exception 03-Feb-2024 18:59:24 CET-----------------------
An exception of category 'UnimplementedFeature' occurred while
   [0] Processing  Event run: 1 lumi: 1284 event: 844217 stream: 0
   [1] Running path 'HLT_DoublePNetTauhPFJet30_Medium_L2NN_eta2p3_v1'
   [2] Calling method for module DeepBoostedJetTagInfoProducer/'hltParticleNetJetTagInfos'
Exception Message:
Attempting to load multiple covariance version in same process. This is not supported.
----- End Fatal Exception -------------------------------------------------

As I understood, this is some bug with an apparently global variable that shouldn't be global.
For now, a way to circumvent the error is to add the following lines to the cmsDriver config file (step2_RAW2DIGI_L1Reco_HLT_RECO_PAT.py)

cov_ver = 0
process.packedPFCandidates.covarianceVersion = cov_ver
process.lostTracks.covarianceVersion = cov_ver
process.hiPixelTracks.covarianceVersion = cov_ver

But ideally we would like to resolve this issue now so it doesn't appear later. Thanks for looking into this.

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 3, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 3, 2024

A new Issue was created by @brallmond Braden Allmond.

@makortel, @smuzaffar, @rappoccio, @sextonkennedy, @Dr15Jones, @antoniovilela can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@slava77
Copy link
Contributor

slava77 commented Feb 5, 2024

  • This can suggest that covariance packing schema is a kind of global

this is by design.

@mbluj
Copy link
Contributor

mbluj commented Feb 5, 2024

  • This can suggest that covariance packing schema is a kind of global

this is by design.

Thanks @slava77 for explanation.
It means that version of track covaraince packing has to be made configurable also in RecoBTag/FeatureTools/plugins/DeepBoostedJetTagInfoProducer.cc to avoid failures of workflows containing both HLT and PAT/miniAOD. But, this introduces dependence of setups of HLT and PAT/miniAOD modules...

@makortel
Copy link
Contributor

makortel commented Feb 5, 2024

assign reconstruction, xpog, hlt

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 5, 2024

New categories assigned: reconstruction,xpog,hlt

@Martin-Grunewald,@mmusich,@vlimant,@hqucms,@jfernan2,@mandrenguyen you have been requested to review this Pull request/Issue and eventually sign? Thanks

@Martin-Grunewald
Copy link
Contributor

Martin-Grunewald commented Feb 5, 2024

Sorry, while running HLT+RECO/PAT/AOD in one cmsRun job is technically possible, it has never been validated (cross-stalk / overwriting of ES modules loaded by HLT v RECO). Thus I guess a safer solution is to split your cmsRun job into separate steps. If we want to make it work, then more action needs to be taken, including solving this specific covariance issue!

-s RAW2DIGI,L1Reco,HLT:PNet,RECO,PAT 

@brallmond
Copy link
Contributor Author

So as I understand, it's not a bug in the sense that no jobs are ever executed in that way in practice?

I am not experienced in how steps are normally split, so I naively combined the steps assuming that was what this twiki was alluding to. If the standard case is to re-emulate the HLT separately (e.g. cmsRun my_hlt_conf_with_full_output.py), and then to give the full_output.root file to cmsDriver, starting from -s RECO, PAT, I would ask if the twiki could be updated to make such steps explicit for the benefit of those with less experience :)

@mmusich
Copy link
Contributor

mmusich commented Feb 5, 2024

So as I understand, it's not a bug in the sense that no jobs are ever executed in that way in practice?

In practice yes, in all production workflows (data and MC).

if the twiki could be updated to make such steps explicit for the benefit of those with less experience :)

I think the twiki never alludes to combine all steps in one, but rather to replace the user defined menu onto an existing (separate) HLT step. We could add an explicit reminder but that's outside of the scope of that section.

@Martin-Grunewald
Copy link
Contributor

Added this text:

Note: while running HLT plus RECO/PAT/AOD in the same cmsDriver command technically works, it has not been validated for Physics usage. The issue here is the loading of possibly confliciting ES module configurations by HLT and by RECO, where one overwrites the other. 

@mandrenguyen
Copy link
Contributor

+reconstruction
Unless we decide to support running HLT + RECO in the same step, this is not an issue that will be addressed.

@mbluj
Copy link
Contributor

mbluj commented Feb 8, 2024

I would really make configurable a version of the track covariance packing in DeepBoostedJetTagInfoProducer module. It will make visible the fact that the covariance packing is used in the HLT mode and make it easier to handle in case HLT + RECO run in the same step or a HLT-like configuration of DeepBoostedJetTagInfoProducer is used at RECO/Mini. Default value will be as current hard-coded one, so not impact on HLT configuration.
I can prepare PR if you agree.

@Martin-Grunewald
Copy link
Contributor

+reconstruction Unless we decide to support running HLT + RECO in the same step, this is not an issue that will be addressed.

Sorry, no, it does not work this way! We have tests running HLT+RECO towards making it work properly, so this error is going backward on that effort in that it even technically would no longer work!

@mbluj
Copy link
Contributor

mbluj commented Feb 8, 2024

+reconstruction Unless we decide to support running HLT + RECO in the same step, this is not an issue that will be addressed.

Sorry, no, it does not work this way! We have tests running HLT+RECO towards making it work properly, so this error is going backward on that effort in that it even technically would no longer work!

@Martin-Grunewald Technically speaking issue is cased by HLT and MiniAOD in the same workflow, so HLT + RECO without miniAOD should work.
But, I agree that it should be fixed and the first step towards it is introduction of configurable argument to DeepBoostedJetTagInfoProducer. I will do this.
Question, to be addresses during review of the proposed fix, is which version of the argument should be default - 0 which is currently hard-coded or 1 used at offline. The former keeps backward compatibility, while the latter is what was actually intended by developers, i.e. to be close to what is used offline. I have not idea if there is a real difference between the two in HLT response.

@slava77
Copy link
Contributor

slava77 commented Feb 8, 2024

PackedCandidate interface does not support a switch between variants during runtime; there is a call_once.

@mbluj
Copy link
Contributor

mbluj commented Feb 8, 2024

PackedCandidate interface does not support a switch between variants during runtime; there is a call_once.

Yes, true. But, variant can be at least set consistently for all modules using PackedCandidates which is not possible when one of the modules has it hard-coded. It is what I want to change. Maybe deeper changes can be implemented later, if really needed, but this will have some wider implications.

@mmusich
Copy link
Contributor

mmusich commented Feb 9, 2024

for the record, there's further discussion ongoing at #43917

@mmusich
Copy link
Contributor

mmusich commented Feb 15, 2024

The HLT configurations part is being followed up at https://its.cern.ch/jira/browse/CMSHLT-3041.

For the record now that #43917 is merged in IBs, I did the following test:

cd CMSSW_14_1_X_2024-02-14-2300/src/
cmsenv
git cms-addpkg HLTrigger/Configuration
hltGetConfiguration /users/ballmond/PNet/BaseV4GRun_w_PNet/V13 --globaltag auto:phase1_2024_realistic --mc --unprescale --max-events 10 --input /store/mc/Run3Winter24Digi/VBFHToTauTau_M125_TuneCP5_13p6TeV_powheg-pythia8/GEN-SIM-RAW/133X_mcRun3_2024_realistic_v8-v2/2560000/000af33b-2a68-4c78-b544-e1c0e3f4f576.root --cff > HLT_PNet_cff.py
# edit HLT_PNet_cff.py such that the diff looks like [1]
cp HLT_PNet_cff.py HLTrigger/Configuration/python
scram b -j 20
cmsDriver.py step2 -s RAW2DIGI,L1Reco,HLT:PNet,RECO,PAT --eventcontent MINIAOD --conditions auto:phase1_2024_realistic --era Run3 --geometry DB:Extended --process reMINI --filein /store/mc/Run3Winter24Digi/VBFHToTauTau_M125_TuneCP5_13p6TeV_powheg-pythia8/GEN-SIM-RAW/133X_mcRun3_2024_realistic_v8-v2/2560000/000af33b-2a68-4c78-b544-e1c0e3f4f576.root -n 5 --mc

and the process runs fine (at least without technically crashing, I can't vouch for the physics content of the output files though, see above discussion).

[1]

--- HLT_PNet_cff.py	2024-02-15 19:14:24.141820661 +0100
+++ HLT_PNet_cff_mod.py	2024-02-15 18:58:22.758950456 +0100
@@ -58980,7 +58980,8 @@
     quality_value_map = cms.InputTag( "" ),
     trkPt_value_map = cms.InputTag( "" ),
     trkEta_value_map = cms.InputTag( "" ),
-    trkPhi_value_map = cms.InputTag( "" )
+    trkPhi_value_map = cms.InputTag( "" ),
+    covarianceVersion = cms.int32(1),
 )
 fragment.hltParticleNetONNXJetTags = cms.EDProducer( "BoostedJetONNXJetTagsProducer",
     src = cms.InputTag( "hltParticleNetJetTagInfos" ),
@@ -65079,7 +65080,8 @@
     quality_value_map = cms.InputTag( "" ),
     trkPt_value_map = cms.InputTag( "" ),
     trkEta_value_map = cms.InputTag( "" ),
-    trkPhi_value_map = cms.InputTag( "" )
+    trkPhi_value_map = cms.InputTag( "" ),
+    covarianceVersion = cms.int32(1),
 )
 fragment.hltParticleNetONNXJetTagsAK8 = cms.EDProducer( "BoostedJetONNXJetTagsProducer",
     src = cms.InputTag( "hltParticleNetJetTagsInfosAK8" ),

Once https://its.cern.ch/jira/browse/CMSHLT-3041 is integrated we'll sign for HLT.

@mmusich
Copy link
Contributor

mmusich commented Mar 5, 2024

+hlt

@mmusich
Copy link
Contributor

mmusich commented Mar 20, 2024

@cms-sw/xpog-l2 is this issue still relevant on your side?
If not can you please sign?

@hqucms
Copy link
Contributor

hqucms commented Mar 20, 2024

+xpog

@cmsbuild
Copy link
Contributor

This issue is fully signed and ready to be closed.

@makortel
Copy link
Contributor

@cmsbuild, please close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants