Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PilotBeam data in relval matrix tests #36133

Merged
merged 1 commit into from
Nov 19, 2021

Conversation

francescobrivio
Copy link
Contributor

@francescobrivio francescobrivio commented Nov 15, 2021

PR description:

This PR adds few workflows in the relval matrix to run on PilotBeams2021 data. Spefically:

  • Added 136.899 for standard cosmics processing with CRAFT2021 data
  • 138.1/138.2/138.3 remain unchanged (apart minor update in name) and are:
    • 138.1 prompt wf on cosmics
    • 138.2 express wf on cosmics
    • 138.3 splashes for 2021 data
  • Newly introduced 138.4/138.5 are prompt and express (respectively) on 2021 collision data (MinBias, is it ok?)
  • Newly introduced 139.00X cycle (as suggested by @bbilin) for standard (whatever it means) pp processing:
    • 139.001 for MinBias
    • 139.002 for ZeroBias
    • 139.003 for HLTPhysics
    • 139.004 for NoBPTX

Important Notes:

PR validation:

Validation for this PR can be run with:
runTheMatrix.py -l 136.897,136.899,138.1,138.2,138.3,138.4,138.5,139.001,139.002,139.003,139.004 -j8 --ibeos

Backport:

N/A

@francescobrivio
Copy link
Contributor Author

francescobrivio commented Nov 15, 2021

Issue with 139.001: (see below)

%MSG-e SiStripBadModuleFedErrService:   SiStripBadComponentInfo:siStripBadComponentInfo@endProcessBlock  15-Nov-2021 17:01:56 CET post-events
Could not find SiStrip/ReadoutView
%MSG
%MSG-w SiStripBadModuleFedErrService:   SiStripBadComponentInfo:siStripBadComponentInfo@endProcessBlock  15-Nov-2021 17:01:56 CET post-events
Empty bad channel map from FED errors
%MSG


A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

Mon Nov 15 17:02:01 CET 2021
Thread 2 (Thread 0x7ff096583700 (LWP 12306) "cmsRun"):
#0  0x00007ff0b60521d9 in waitpid () from /lib64/libpthread.so.0
#1  0x00007ff0afece5a7 in edm::service::cmssw_stacktrace_fork() () from /cvmfs/cms-ib.cern.ch/nweek-02707/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-14-0000/lib/slc7_amd64_
gcc900/pluginFWCoreServicesPlugins.so
#2  0x00007ff0afecf23a in edm::service::InitRootHandlers::stacktraceHelperThread() () from /cvmfs/cms-ib.cern.ch/nweek-02707/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-14-0
000/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#3  0x00007ff0b664baf0 in std::execute_native_thread_routine (__p=0x7ff0970b70e0) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#4  0x00007ff0b604aea5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007ff0b5d73b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7ff0b3ece540 (LWP 12103) "cmsRun"):
#0  0x00007ff0b5d68ddd in poll () from /lib64/libc.so.6
#1  0x00007ff0afece9d7 in full_read.constprop () from /cvmfs/cms-ib.cern.ch/nweek-02707/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-14-0000/lib/slc7_amd64_gcc900/pluginFWCor
eServicesPlugins.so
#2  0x00007ff0afecf30c in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/nweek-02707/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-14-000
0/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#3  0x00007ff0afed27ab in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/nweek-02707/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-14-0000/lib/slc7_amd64_gcc900/pluginFW
CoreServicesPlugins.so
#4  <signal handler called>
#5  0x00007ff0324b7f03 in SiPixelPhase1ResidualsExtra::fillMEs(dqm::implementation::IBooker&, dqm::implementation::IGetter&) () from /cvmfs/cms-ib.cern.ch/nweek-02707/slc7_amd64_
gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-14-0000/lib/slc7_amd64_gcc900/pluginDQMSiPixelPhase1TrackAuto.so
#6  0x00007ff0324aa714 in non-virtual thunk to DQMEDHarvester::endProcessBlockProduce(edm::ProcessBlock&) () from /cvmfs/cms-ib.cern.ch/nweek-02707/slc7_amd64_gcc900/cms/cmssw/CM
SSW_12_2_X_2021-11-14-0000/lib/slc7_amd64_gcc900/pluginDQMSiPixelPhase1TrackAuto.so
#7  0x00007ff0b88121d8 in edm::one::EDProducerBase::doEndProcessBlock(edm::ProcessBlockPrincipal const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/nweek-027
07/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-14-0000/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#8  0x00007ff0b87f4ac0 in edm::WorkerT<edm::one::EDProducerBase>::implDoEndProcessBlock(edm::ProcessBlockPrincipal const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib
.cern.ch/nweek-02707/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-14-0000/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#9  0x00007ff0b86fd487 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3> >(e
dm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::Proces
sBlockPrincipal, (edm::BranchActionType)3>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3> >(ed
m::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::Process
BlockPrincipal, (edm::BranchActionType)3>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/nweek-02707/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-14-0000/lib/s
lc7_amd64_gcc900/libFWCoreFramework.so
#10 0x00007ff0b86fd68d in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3> >(edm::OccurrenceTraits<edm::ProcessBlockPrincip
al, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3>::C
ontext const*) () from /cvmfs/cms-ib.cern.ch/nweek-02707/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-14-0000/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#11 0x00007ff0b86fd936 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3>
 >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentC
ontext const&, edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3>::Context const*) () from /cvmfs/cms-ib.cern.ch/nweek-02707/slc7_amd64_gcc900/cms/cmssw/
CMSSW_12_2_X_2021-11-14-0000/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#12 0x00007ff0b86fdd50 in void edm::SerialTaskQueueChain::actionToRun<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3> >::ex
ecute()::{lambda()#1}&>(edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3> >::execute()::{lambda()#1}&) () from /cvmfs/cms-ib.
cern.ch/nweek-02707/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-14-0000/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#13 0x00007ff0b86fdf01 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::Bran
chActionType)3> >::execute()::{lambda()#1}&>(tbb::detail::d1::task_group&, edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::ProcessBlockPrincipal, (edm::BranchActionType)3> 
>::execute()::{lambda()#1}&)::{lambda()#1}>::execute() () from /cvmfs/cms-ib.cern.ch/nweek-02707/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-14-0000/lib/slc7_amd64_gcc900/li
bFWCoreFramework.so
#14 0x00007ff0b894c299 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) ()
 from /cvmfs/cms-ib.cern.ch/nweek-02707/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-14-0000/lib/slc7_amd64_gcc900/libFWCoreConcurrency.so
#15 0x00007ff0b6e6c3ff in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7ff0b293fe00) at /d
ata/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_1_0_pre5-slc7_amd64_gcc900/build/CMSSW_12_1_0_pre5-build/BUILD/slc7_amd64_gcc900/external/tbb/v2021.4.0-651a6efca0c94b3c25e36a8f
aa72480b/tbb-v2021.4.0/src/tbb/task_dispatcher.h:322
#16 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7ff0b293fe00) at /data/cmsbld/jenkins/workspace/
auto-builds/CMSSW_12_1_0_pre5-slc7_amd64_gcc900/build/CMSSW_12_1_0_pre5-build/BUILD/slc7_amd64_gcc900/external/tbb/v2021.4.0-651a6efca0c94b3c25e36a8faa72480b/tbb-v2021.4.0/src/tb
b/task_dispatcher.h:463
#17 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_1_0_pre5-slc7_amd64_gcc
900/build/CMSSW_12_1_0_pre5-build/BUILD/slc7_amd64_gcc900/external/tbb/v2021.4.0-651a6efca0c94b3c25e36a8faa72480b/tbb-v2021.4.0/src/tbb/task_dispatcher.cpp:168
#18 0x00007ff0b86c8067 in edm::EventProcessor::endProcessBlock(bool, bool) () from /cvmfs/cms-ib.cern.ch/nweek-02707/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-14-0000/lib/
slc7_amd64_gcc900/libFWCoreFramework.so
#19 0x00007ff0b86ce821 in edm::EventProcessor::runToCompletion() () from /cvmfs/cms-ib.cern.ch/nweek-02707/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-14-0000/lib/slc7_amd64
_gcc900/libFWCoreFramework.so
#20 0x000000000040bae6 in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#21 0x00007ff0b6e7fc6d in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_1_0_pre5-slc7_amd64_gcc900/build/CMSSW_
12_1_0_pre5-build/BUILD/slc7_amd64_gcc900/external/tbb/v2021.4.0-651a6efca0c94b3c25e36a8faa72480b/tbb-v2021.4.0/src/tbb/arena.cpp:698
#22 0x000000000040ca58 in main::{lambda()#1}::operator()() const ()
#23 0x000000000040b62c in main ()

Current Modules:

Module: SiPixelPhase1ResidualsExtra:SiPixelPhase1ResidualsExtra (crashed)

A fatal system signal has occurred: segmentation violation

EDIT: this was fixed in 341ea11

@francescobrivio
Copy link
Contributor Author

Issue with 139.002:

----- Begin Fatal Exception 15-Nov-2021 16:15:06 CET-----------------------
An exception of category 'ProductNotFound' occurred while
   [0] Processing  Event run: 346512 lumi: 280 event: 271772782 stream: 0
   [1] Running path 'dqmoffline_8_step'
   [2] Prefetching for module L1TMuonDQMOffline/'l1tMuonDQMOfflineEmu'
   [3] Prefetching for module L1TMuonProducer/'simGmtStage2Digis'
   [4] Prefetching for module L1TMuonOverlapPhase1TrackProducer/'simOmtfDigis'
   [5] Prefetching for module CSCTriggerPrimitivesProducer/'simCscTriggerPrimitiveDigis'
   [6] Prefetching for module GEMPadDigiClusterProducer/'simMuonGEMPadDigiClusters'
   [7] Calling method for module GEMPadDigiProducer/'simMuonGEMPadDigis'
Exception Message:
Principal::getByToken: Found zero products matching all criteria
Looking for type: MuonDigiCollection<GEMDetId,GEMDigi>
Looking for module label: simMuonGEMDigis
Looking for productInstanceName: 

   Additional Info:
      [a] If you wish to continue processing events after a ProductNotFound exception,
add "SkipEvent = cms.untracked.vstring('ProductNotFound')" to the "options" PSet in the configuration.

----- End Fatal Exception -------------------------------------------------

@francescobrivio francescobrivio marked this pull request as draft November 15, 2021 18:17
@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-36133/26648

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @francescobrivio for master.

It involves the following packages:

  • Configuration/PyReleaseValidation (pdmv, upgrade)
  • Configuration/StandardSequences (operations)

@perrotta, @jordan-martins, @bbilin, @wajidalikhan, @cmsbuild, @AdrianoDee, @srimanob, @kskovpen, @qliphy, @fabiocos, @davidlange6 can you please review it and eventually sign? Thanks.
@fabiocos, @makortel, @felicepantaleo, @GiacomoSguazzoni, @JanFSchulte, @rovere, @VinInn, @Martin-Grunewald, @missirol, @kpedro88, @lecriste, @mtosi, @ebrondol, @mmusich, @dgulhan, @slomeo this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@francescobrivio
Copy link
Contributor Author

FYI @malbouis @tvami

@tvami
Copy link
Contributor

tvami commented Nov 15, 2021

Maybe we should also tag @cms-sw/dqm-l2 , right?

@Martin-Grunewald
Copy link
Contributor

Martin-Grunewald commented Nov 16, 2021

It seems this time the GEM related error is no longer in L1REPACK (as fixed) but rather in DQM (dqmoffline_8_step) - perhaps some similar modifications need to be done there?

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-36133/26667

@cmsbuild
Copy link
Contributor

Pull request #36133 was updated. @perrotta, @jordan-martins, @bbilin, @wajidalikhan, @cmsbuild, @AdrianoDee, @srimanob, @kskovpen, @qliphy, @fabiocos, @davidlange6 can you please check and sign again.

@francescobrivio
Copy link
Contributor Author

It seems this time the GEM related error is no longer in L1REPACK (as fixed) but rather in DQM (dqmoffline_8_step) - perhaps some similar modifications need to be done there?

thanks @Martin-Grunewald indeed i'm investigating the issue with @jfernan2!

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-36133/26668

@bbilin
Copy link
Contributor

bbilin commented Nov 18, 2021

+pdmv

@srimanob
Copy link
Contributor

+Upgrade

Adding pilot beam data to runTheMatrix. All added workflows run fine.

@tvami
Copy link
Contributor

tvami commented Nov 19, 2021

@perrotta @qliphy please consider this fully signed :)

@perrotta
Copy link
Contributor

@perrotta @qliphy please consider this fully signed :)

No, it still has to be reviewed by @cms-sw/l1-l2 , I know that @rekovic already started looking at it, let wait for his "+1"

@rekovic
Copy link
Contributor

rekovic commented Nov 19, 2021

+1

@perrotta
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will be automatically merged.

Comment on lines +2710 to +2714
steps['ALCARECOPROMPTR3']=merge([{'-s':'RAW2DIGI,L1Reco,RECO,ALCA:SiStripCalZeroBias+SiStripCalMinBias+TkAlMinBias+HcalCalHO+HcalCalIterativePhiSym+HcalCalHBHEMuonFilter+HcalCalIsoTrkFilter,DQM',
'--conditions':'auto:run3_data_prompt',
'--scenario':'pp',
'--era':'Run3',
'--datatier':'RECO,MINIAOD,ALCARECO,DQMIO',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's puzzling to see MINIAOD in the data outputs but no PAT step

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @slava77 thanks for the suggestion, I admit i'm not a RECO nor PdmV expert, I was merely trying to put in the tests some recent data with wfs as similar as possible to the already existing ones. If you want to open a PR to improve/add these workflow please feel free to do it. Thanks a lot!

'--datatier':'RECO,MINIAOD,ALCARECO,DQMIO',
'--eventcontent':'RECO,MINIAOD,ALCARECO,DQM',
'--triggerResultsProcess': 'RECO',
'--customise':'Configuration/DataProcessing/RecoTLR.customisePrompt'},steps['RECODR3']])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

customisePostEra_Run3 is used in the T0 configuration, see Configuration/DataProcessing/python/Impl/ppEra_Run3.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #36133 (comment)

@@ -2003,6 +2009,15 @@ def lhegensim2018ml(fragment,howMuch):
'--conditions':'auto:run2_data'
},steps['TIER0']])

steps['TIER0EXPRUN3']=merge([{'-s':'RAW2DIGI,L1Reco,RECO,EI,ALCAPRODUCER:SiPixelCalZeroBias+SiStripCalZeroBias+SiStripCalMinBias+SiStripCalMinBiasAAG+TkAlMinBias,DQM:@express,ENDJOB',
'--process':'RECO',
'--datatier':'ALCARECO,DQMIO',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reason for a T0 express setup to write only ALCARECO event content? IIRC, the express writes FEVT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was copied from the already existing one:

steps['TIER0EXPRUN2']=merge([{'-s':'RAW2DIGI,L1Reco,RECO,ALCAPRODUCER:@allForExpress+AlCaPCCZeroBiasFromRECO+AlCaPCCRandomFromRECO,DQM:@express,ENDJOB',
'--process':'RECO',
'--datatier':'ALCARECO,DQMIO',
'--eventcontent':'ALCARECO,DQM',
'--customise':'Configuration/DataProcessing/RecoTLR.customiseExpress',
'--era':'Run2_2017',
'--conditions':'auto:run2_data'
},steps['TIER0']])

AFAIK Express step does write ALCARECO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this ALCA-specific step?
https://github.com/dmwm/T0/blob/master/etc/ProdOfflineConfiguration.py
has ALCARECO only in ExpressAlignment and ALCALUMIPIXELSEXPRESS datasets.

@cms-sw/alca-l2 please check/clarify

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these under the standard express are alcarecos or alcaprompts:
https://github.com/dmwm/T0/blob/master/etc/ProdOfflineConfiguration.py#L246-L250

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these under the standard express are alcarecos or alcaprompts: https://github.com/dmwm/T0/blob/master/etc/ProdOfflineConfiguration.py#L246-L250

good, but your reference points to the Express PD, which writes FEVT, while this relval step is configured to write ALCARECO. It looks like a fix is needed.


#Run 3
steps['RECODR3']=merge([{'--scenario':'pp',
'-s':'RAW2DIGI,L1Reco,RECO,DQM',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PAT got lost here as well
it seems like this step is not used in any workflow though. so, no harm so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.