-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add wfs with HLT as separate step #37603
Conversation
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37603/29370
|
A new Pull Request was created by @kskovpen for master. It involves the following packages:
@jordan-martins, @bbilin, @wajidalikhan, @cmsbuild, @AdrianoDee, @srimanob, @kskovpen can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
test parameters:
|
please test |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37603/29371
|
Pull request #37603 was updated. @jordan-martins, @bbilin, @wajidalikhan, @cmsbuild, @AdrianoDee, @srimanob, @kskovpen can you please check and sign again. |
test parameters:
|
please test |
@kskovpen could you please quote the error you get in EcalDQMonitorClient:ecalMonitorClient and a recipe to reproduce it? |
'-n':'10', | ||
'--eventcontent':'FEVTDEBUGHLT', | ||
'--geometry' : geom, | ||
'--outputCommands' : '"drop *_*_*_GEN,drop *_*_*_DIGI2RAW"' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jfernan2 ! You can reproduce the DQM crash by replacing the drop statements here with:
"drop *_*_*_GEN,drop *_*_*_SIM,drop *_*_*_DIGI2RAW"
and running 12424.0. The error message at the last step is:
A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.
Sun Apr 17 20:58:01 CEST 2022
Thread 2 (Thread 0x7f071393e700 (LWP 1782) "cmsRun"):
#0 0x00007f073c5e41d9 in waitpid () from /lib64/libpthread.so.0
#1 0x00007f07362665e7 in edm::service::cmssw_stacktrace_fork() () from /cvmfs/cms.cern.ch/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_3_0/lib/slc7_amd64_gcc11/pluginFWCoreServicesPlugins.so
#2 0x00007f073626712a in edm::service::InitRootHandlers::stacktraceHelperThread() () from /cvmfs/cms.cern.ch/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_3_0/lib/slc7_amd64_gcc11/pluginFWCoreServicesPlugins.so
#3 0x00007f073cbe2bf4 in std::execute_native_thread_routine (__p=0x7f07323e4600) at ../../../../../libstdc++-v3/src/c++11/thread.cc:82
#4 0x00007f073c5dcea5 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f073c305b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f073a458540 (LWP 1577) "cmsRun"):
#0 0x00007f073c2faddd in poll () from /lib64/libc.so.6
#1 0x00007f073626689f in full_read.constprop () from /cvmfs/cms.cern.ch/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_3_0/lib/slc7_amd64_gcc11/pluginFWCoreServicesPlugins.so
#2 0x00007f07362671fc in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms.cern.ch/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_3_0/lib/slc7_amd64_gcc11/pluginFWCoreServicesPlugins.so
#3 0x00007f0736269a3b in sig_dostack_then_abort () from /cvmfs/cms.cern.ch/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_3_0/lib/slc7_amd64_gcc11/pluginFWCoreServicesPlugins.so
#4
#5 0x00007f06ee55dcbc in EcalCondObjectContainer::find(unsigned int) const () from /cvmfs/cms.cern.ch/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_3_0/lib/slc7_amd64_gcc11/libDQMEcalMonitorClient.so
#6 0x00007f06ee55ba65 in ecaldqm::IntegrityClient::producePlots(ecaldqm::DQWorkerClient::ProcessType) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_3_0/lib/slc7_amd64_gcc11/libDQMEcalMonitorClient.so
#7 0x00007f06ee5bb5b4 in EcalDQMonitorClient::runWorkers(dqm::implementation::IGetter&, ecaldqm::DQWorkerClient::ProcessType) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_3_0/lib/slc7_amd64_gcc11/pluginDQMEcalMonitorClientPlugins.so
#8 0x00007f06ee5bbf8d in EcalDQMonitorClient::dqmEndJob(dqm::implementation::IBooker&, dqm::implementation::IGetter&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_3_0/lib/slc7_amd64_gcc11/pluginDQMEcalMonitorClientPlugins.so
#9 0x00007f06ee5bea34 in non-virtual thunk to DQMEDHarvester::endProcessBlockProduce(edm::ProcessBlock&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_3_0/lib/slc7_amd64_gcc11/pluginDQMEcalMonitorClientPlugins.so
#10 0x00007f073ed9f1e0 in edm::one::EDProducerBase::doEndProcessBlock(edm::ProcessBlockPrincipal const&, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_3_0/lib/slc7_amd64_gcc11/libFWCoreFramework.so
#11 0x00007f073ed88a80 in edm::WorkerTedm::one::EDProducerBase::implDoEndProcessBlock(edm::ProcessBlockPrincipal const&, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_3_0/lib/slc7_amd64_gcc11/libFWCoreFramework.so
#12 0x00007f073ec95567 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3> >(edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3> >(edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3>::Context const*)::{lambda()#1}) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_3_0/lib/slc7_amd64_gcc11/libFWCoreFramework.so
#13 0x00007f073ec95960 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3>::Context const*) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_3_0/lib/slc7_amd64_gcc11/libFWCoreFramework.so
#14 0x00007f073ec95f0a in void edm::SerialTaskQueueChain::actionToRun<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3> >::execute()::{lambda()#1}&>(edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3> >::execute()::{lambda()#1}&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_3_0/lib/slc7_amd64_gcc11/libFWCoreFramework.so
#15 0x00007f073ec95fe1 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3> >::execute()::{lambda()#1}&>(tbb::detail::d1::task_group&, edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3> >::execute()::{lambda()#1}&)::{lambda()#1}>::execute() () from /cvmfs/cms.cern.ch/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_3_0/lib/slc7_amd64_gcc11/libFWCoreFramework.so
#16 0x00007f073eee6055 in tbb::detail::d1::function_taskedm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::{lambda()#1}::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_3_0/lib/slc7_amd64_gcc11/libFWCoreConcurrency.so
#17 0x00007f073d444a59 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x7f06b7054c00, this=0x7f0738eafe00) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_0-slc7_amd64_gcc11/build/CMSSW_12_3_0-build/BUILD/slc7_amd64_gcc11/external/tbb/v2021.4.0-0929d4245541a9360696e439234c1bfc/tbb-v2021.4.0/src/tbb/task_dispatcher.h:322
#18 tbb::detail::r1::task_dispatcher::local_wait_for_alltbb::detail::r1::external_waiter (waiter=..., t=, this=0x7f0738eafe00) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_0-slc7_amd64_gcc11/build/CMSSW_12_3_0-build/BUILD/slc7_amd64_gcc11/external/tbb/v2021.4.0-0929d4245541a9360696e439234c1bfc/tbb-v2021.4.0/src/tbb/task_dispatcher.h:463
#19 tbb::detail::r1::task_dispatcher::execute_and_wait (t=, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_0-slc7_amd64_gcc11/build/CMSSW_12_3_0-build/BUILD/slc7_amd64_gcc11/external/tbb/v2021.4.0-0929d4245541a9360696e439234c1bfc/tbb-v2021.4.0/src/tbb/task_dispatcher.cpp:168
#20 0x00007f073ec624c3 in edm::EventProcessor::endProcessBlock(bool, bool) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_3_0/lib/slc7_amd64_gcc11/libFWCoreFramework.so
#21 0x00007f073ec667f9 in edm::EventProcessor::runToCompletion() () from /cvmfs/cms.cern.ch/slc7_amd64_gcc11/cms/cmssw/CMSSW_12_3_0/lib/slc7_amd64_gcc11/libFWCoreFramework.so
#22 0x000000000040a18d in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#23 0x00007f073d432898 in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_0-slc7_amd64_gcc11/build/CMSSW_12_3_0-build/BUILD/slc7_amd64_gcc11/external/tbb/v2021.4.0-0929d4245541a9360696e439234c1bfc/tbb-v2021.4.0/src/tbb/arena.cpp:698
#24 0x000000000040afd9 in main::{lambda()#1}::operator()() const ()
#25 0x00000000004096fc in main ()
Current Modules:
Module: EcalDQMonitorClient:ecalMonitorClient (crashed)
@@ -20,6 +20,10 @@ | |||
'2021PU', | |||
'2021Design', | |||
'2021DesignPU', | |||
'2021HLT', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not change the order here. New workflow should go at the end, i.e. after 2024.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will append it to the end.
@kskovpen |
Thanks @srimanob. I also thought that probably defining a full batch of new wfs would be an overkill. Anyhow, I can put it in the offset wfs. |
-1 Failed Tests: RelVals The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
You can see more details here: RelVals
|
OK, epic fail. I will create a few offset wfs. |
Update: instead of creating a bunch of alternative wfs, add one test wf (11634.601) where the HLT step is separated out from DIGI. |
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37603/29416 ERROR: Unable to merge PR. See log https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37603/29416/cms-checkout-topic.log |
@@ -3486,7 +3486,7 @@ def gen2021HiMix(fragment,howMuch): | |||
defaultDataSets['2026D49']='CMSSW_12_0_0_pre4-113X_mcRun4_realistic_v7_2026D49noPU-v' | |||
defaultDataSets['2026D76']='CMSSW_12_0_0_pre4-113X_mcRun4_realistic_v7_2026D76noPU-v' | |||
defaultDataSets['2026D77']='CMSSW_12_1_0_pre2-113X_mcRun4_realistic_v7_2026D77noPU-v' | |||
defaultDataSets['2026D88']='CMSSW_12_3_0_pre5-123X_mcRun4_realistic_v4_2026D88noPU-v' | |||
#defaultDataSets['2026D88']='CMSSW_12_2_0_pre3-122X_mcRun4_realistic_v4_2026D88noPU-v' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to disable this?
@@ -171,6 +171,7 @@ def condition(self, fragment, stepList, key, hasHarvest): | |||
'GenSimHLBeamSpotHGCALCloseBy', | |||
'Digi', | |||
'DigiTrigger', | |||
'HLT', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HLT seems to be very common name. Can it be more specific, i.e. HLTRun3?
I see that Run3 FS is removed. Should you try to with a clean IB release, ie. most recent one CMSSW_12_4_X_2022-04-20-1100? |
I am going to submit another PR. This one clashed with some specific 12_3_0 code. Closing this one. |
PR description:
Introduce an additional set of Run 3 wfs where the HLT step is separated out from DIGI, following the studies mentioned in #37564. As suggested in that discussion, the GEN and DIGI2RAW are dropped from the output file at the HLT step. While it makes perfect sense to also drop the SIM collections, the HARVESTING step crashes in EcalDQMonitorClient:ecalMonitorClient. Maybe @cms-sw/dqm-l2 would have an idea why it happens.
PR validation:
Ran the new wfs.
if this PR is a backport please specify the original PR and why you need to backport that PR:
Not a backport.