Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime error from BeamOnlineMonitor on Run-2 data #39948

Closed
missirol opened this issue Nov 1, 2022 · 12 comments · Fixed by #40716
Closed

runtime error from BeamOnlineMonitor on Run-2 data #39948

missirol opened this issue Nov 1, 2022 · 12 comments · Fixed by #40716

Comments

@missirol
Copy link
Contributor

missirol commented Nov 1, 2022

An instance of the BeamOnlineMonitor plugin runs online at HLT (it is normally not used when running the HLT offline).

Trying to run on 2018 HIon data [1], one encounters a runtime error from this plugin [2].

The error originates from BeamSpotOnlineObjects::startTime(), because stringParams_ is somehow empty:
https://github.com/cms-sw/cmssw/blob/CMSSW_12_5_1/DQM/BeamMonitor/plugins/OnlineBeamMonitor.cc#L157
https://github.com/cms-sw/cmssw/blob/CMSSW_12_5_1/CondFormats/BeamSpotObjects/src/BeamSpotOnlineObjects.cc#L70
https://github.com/cms-sw/cmssw/blob/CMSSW_12_5_1/CondFormats/BeamSpotObjects/src/BeamSpotOnlineObjects.cc#L23

The reproducer in [1] uses Run-2 data; afaiu, this means that the plugin tries to access the first payload/IOV of the following tag
https://cms-conddb.cern.ch/cmsDbBrowser/list/Prod/tags/BeamSpotOnlineHLT

Using Run-3 data, there is no runtime error.

This use case might be irrelevant for production (e.g. online operations); still, it would be useful to know why it happens, and whether or not a fix/improvement is possible.

FYI: @francescobrivio @silviodonato @cms-sw/hlt-l2


[1] Reproducer in CMSSW_12_5_1 (most likely, this issue is not specific to a particular release, and occurs in any recent 12_{4,5,6}_* release):

#!/bin/bash

# Run-2 HIon data
EDMINPUT=root://eoscms.cern.ch//eos/cms/store/group/dpg_trigger/comm_trigger/TriggerStudiesGroup/STORM/RAW/HIRun2018A_HIHardProbes_run326479/0E2CC5D5-9D87-7348-9219-B00CD718C847.root

# # Run-3 pp data
# EDMINPUT=root://eoscms.cern.ch//eos/cms/store/group/dpg_trigger/comm_trigger/TriggerStudiesGroup/STORM/RAW/Run2022B_HLTPhysics0_run355558/cd851cf4-0fca-4d76-b80e-1d33e1371929.root

hltGetConfiguration \
   /users/missirol/test/dev/CMSSW_12_4_0/CMSHLT_2515/Test02/HLT/V3 \
   --globaltag 124X_dataRun3_HLT_v4 \
   --data \
   --type HIon \
   --unprescale \
   --output all \
   --max-events 10 \
   --eras Run3 --l1-emulator Full --l1 L1Menu_CollisionsHeavyIons2022_v1_1_0-d1_xml \
   --input "${EDMINPUT}" \
   > hlt.py

cat <<@EOF >> hlt.py

from FWCore.ParameterSet.MassReplace import massSearchReplaceAnyInputTag
massSearchReplaceAnyInputTag(process.SimL1Emulator, 'rawDataCollector', 'rawDataRepacker', False, True)

process.options.numberOfThreads = 1
process.options.numberOfStreams = 0

# uncomment the following line to avoid a runtime error
#del process.hltOnlineBeamMonitor
@EOF

cmsRun hlt.py &> hlt.log

[2]

----- Begin Fatal Exception 01-Nov-2022 12:25:45 CET-----------------------
An exception of category 'StdException' occurred while
   [0] Processing global begin LuminosityBlock run: 326479 luminosityBlock: 7
   [1] Calling method for module OnlineBeamMonitor/'hltOnlineBeamMonitor'
Exception Message:
A std::exception was thrown.
Parameter with index 0 is out of range.
----- End Fatal Exception -------------------------------------------------
@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 1, 2022

A new Issue was created by @missirol Marino Missiroli.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@mmusich
Copy link
Contributor

mmusich commented Nov 1, 2022

@missirol
this happens because the first IoV of BeamSpotOnlineHLT was likely written before the CondFormat was changed in PR #35338 and therefore you are trying to resolve a data member that doesn't exist (quick reminder schema evolution is still not supported for condition data formats)
I see two possibilities:

  • change the code BeamOnlineMonitor in order to deal with the fact that payloads written in the "old" style might exist (e.g. as done here (even though seems an overkill for something not meant for production)

if (foo <= parameters::maxPVs) {
const auto output = try_<int, std::out_of_range>(std::bind(myIntFunctor, param), print_error);
edm::LogInfo("BeamSpotOnline_PayloadInspector") << theLabel.c_str() << " : " << output << std::endl;
h2_ExtraBSParameters->SetBinContent(1, yBin, output);
} else if (foo <= parameters::rmsErrorPV) {
const auto output = try_<float, std::out_of_range>(std::bind(myFloatFunctor, param), print_error);
edm::LogInfo("BeamSpotOnline_PayloadInspector") << theLabel.c_str() << " : " << output << std::endl;
h2_ExtraBSParameters->SetBinContent(1, yBin, output);
} else if (foo <= parameters::endTimeStamp) {
const auto output = try_<cond::Time_t, std::out_of_range>(std::bind(myTimeFunctor, param), print_error);
edm::LogInfo("BeamSpotOnline_PayloadInspector") << theLabel.c_str() << " : " << output << std::endl;
h2_ExtraBSParameters->SetBinContent(1, yBin, output);
//} else if( foo <=parameters::lumiRange){
// const auto output = try_<std::string,std::out_of_range>(std::bind(myStringFunctor, param), print_error);
//edm::LogInfo("BeamSpotOnline_PayloadInspector") << theLabel.c_str() << " : " << output << std::endl;
//h2_ExtraBSParameters->SetBinContent(1, yBin, output);
}

  • supply your workflow with an ad-hoc tag that covers the Run 2 data with a payload with the newer format @cms-sw/alca-l2

having said that I really hope you are not expecting it to have any physical meaning as these old payload were just for tests (see https://cern.ch/go/7Tng)

@makortel
Copy link
Contributor

makortel commented Nov 1, 2022

assign hlt,alca,db

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 1, 2022

New categories assigned: hlt,db,alca

@yuanchao,@missirol,@ggovi,@francescobrivio,@francescobrivio,@malbouis,@malbouis,@saumyaphor4252,@saumyaphor4252,@tvami,@tvami,@Martin-Grunewald,@ChrisMisan you have been requested to review this Pull request/Issue and eventually sign? Thanks

@missirol
Copy link
Contributor Author

missirol commented Nov 1, 2022

Thanks for all the info, @mmusich.

Indeed this was just a technical test, maybe not even a use case for anybody else, so I'd not ask changes or new tags.

I did miss that those tags had placeholder values for their first IOV. Iiuc, in this case the online-beamspot producer falls back on the Run-2 tag, so at least the beamspot used in the HLT reco should be okay physics-wise even in this case (but this remains just a technical check).

For this plugin, maybe one could use a simplified version of the solution in BeamSpotOnline_PayloadInspector (basically, moving the filling of beamSpotsMap_ inside a try/catch block); I could look into that if that's useful.

I can close this ticket unless there are other comments.

@mmusich
Copy link
Contributor

mmusich commented Nov 1, 2022

For this plugin, maybe one could use a simplified version of the solution in BeamSpotOnline_PayloadInspector (basically, moving the filling of beamSpotsMap_ inside a try/catch block); I could look into that if that's useful.

I'd prefer if that's done by the BeamSpot team, let's keep the ticket open and close it once we have a solution.

@francescobrivio
Copy link
Contributor

I'd prefer if that's done by the BeamSpot team, let's keep the ticket open and close it once we have a solution.

Noted! yes please keep this ticket open.

@gennai
Copy link
Contributor

gennai commented Feb 3, 2023

@missirol this happens because the first IoV of BeamSpotOnlineHLT was likely written before the CondFormat was changed in PR #35338 and therefore you are trying to resolve a data member that doesn't exist (quick reminder schema evolution is still not supported for condition data formats) I see two possibilities:

  • change the code BeamOnlineMonitor in order to deal with the fact that payloads written in the "old" style might exist (e.g. as done here (even though seems an overkill for something not meant for production)

if (foo <= parameters::maxPVs) {
const auto output = try_<int, std::out_of_range>(std::bind(myIntFunctor, param), print_error);
edm::LogInfo("BeamSpotOnline_PayloadInspector") << theLabel.c_str() << " : " << output << std::endl;
h2_ExtraBSParameters->SetBinContent(1, yBin, output);
} else if (foo <= parameters::rmsErrorPV) {
const auto output = try_<float, std::out_of_range>(std::bind(myFloatFunctor, param), print_error);
edm::LogInfo("BeamSpotOnline_PayloadInspector") << theLabel.c_str() << " : " << output << std::endl;
h2_ExtraBSParameters->SetBinContent(1, yBin, output);
} else if (foo <= parameters::endTimeStamp) {
const auto output = try_<cond::Time_t, std::out_of_range>(std::bind(myTimeFunctor, param), print_error);
edm::LogInfo("BeamSpotOnline_PayloadInspector") << theLabel.c_str() << " : " << output << std::endl;
h2_ExtraBSParameters->SetBinContent(1, yBin, output);
//} else if( foo <=parameters::lumiRange){
// const auto output = try_<std::string,std::out_of_range>(std::bind(myStringFunctor, param), print_error);
//edm::LogInfo("BeamSpotOnline_PayloadInspector") << theLabel.c_str() << " : " << output << std::endl;
//h2_ExtraBSParameters->SetBinContent(1, yBin, output);
}

  • supply your workflow with an ad-hoc tag that covers the Run 2 data with a payload with the newer format @cms-sw/alca-l2

having said that I really hope you are not expecting it to have any physical meaning as these old payload were just for tests (see https://cern.ch/go/7Tng)

Actually we discussed it times ago with Giacomo Govi, the idea was to clean up the old test tags from the DB to avoid this to happen. Anyway I think for the client that makes the plot the try/catch option is doable, as this code also runs on DQM, I have to think if this may mask some possible issues

@mmusich
Copy link
Contributor

mmusich commented Feb 3, 2023

Actually we discussed it times ago with Giacomo Govi, the idea was to clean up the old test tags from the DB to avoid this to happen.

mmh, that's a very bad idea, because the tag in question BeamSpotOnlineHLT is not a test tag, but actually a tag used in plenty of production HLT Global Tags.

@francescobrivio
Copy link
Contributor

I agree with Marco, the cleanup isn't a viable way.
I'm trying to implement a try/catch approach similar to the PI pluging linked by Marco above.

@gennai
Copy link
Contributor

gennai commented Feb 3, 2023

you do not remove the tag but the payload uploaded for tests, so that only the latest version of the conformat is populated. Anyway I do not want to raise more entropy, let's concentrate on the try/catch option

@mmusich
Copy link
Contributor

mmusich commented Feb 3, 2023

you do not remove the tag but the payload uploaded for tests

since you can't overwrite in the past of the HLT FCSR with the conditions uploader by policy, that would mean messing with the master copy of production database manually, which is a risky operation and I think not really suitable in this situation.

let's concentrate on the try/catch option

yes, please do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants