Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

random errors in fastsim addons #24051

Closed
davidlange6 opened this issue Jul 25, 2018 · 31 comments
Closed

random errors in fastsim addons #24051

davidlange6 opened this issue Jul 25, 2018 · 31 comments

Comments

@davidlange6
Copy link
Contributor

I've seen this failure a few times in pr tests

https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-24037/29441/addOnTests/fastsim1/cmsDriver.py_TTbar_13TeV_TuneCUETP8M1_cfi_--conditions_auto:run2_mc_l1stage1_--fast__-n_100_--eventcontent_AODSIM,DQM_--relval_100000,1000_-s_GEN,SIM,.log

Thread 4 (Thread 0x7fc81b7fe700 (LWP 15838)):
#0 0x0000003752adf403 in poll () from /lib64/libc.so.6
#1 0x00007fc883259fe7 in full_read.constprop () from /cvmfs/cms-ib.cern.ch/nweek-02534/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_3_X_2018-07-24-1100/lib/slc6_amd64_gcc700/pluginFWCoreServicesPlugins.so
#2 0x00007fc88325a67c in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/nweek-02534/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_3_X_2018-07-24-1100/lib/slc6_amd64_gcc700/pluginFWCoreServicesPlugins.so
#3 0x00007fc88325b6e9 in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/nweek-02534/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_3_X_2018-07-24-1100/lib/slc6_amd64_gcc700/pluginFWCoreServicesPlugins.so
#4
#5 0x00007fc86d7c468a in TBLayer::groupedCompatibleDetsV(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&, std::vector<DetGroup, std::allocator >&) const () from /cvmfs/cms-ib.cern.ch/nweek-02534/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_3_X_2018-07-24-1100/lib/slc6_amd64_gcc700/libRecoTrackerTkDetLayers.so
#6 0x00007fc86d735a94 in GeometricSearchDet::compatibleDetsV(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&, std::vector<std::pair<GeomDet const*, TrajectoryStateOnSurface>, std::allocator<std::pair<GeomDet const*, TrajectoryStateOnSurface> > >&) const () from /cvmfs/cms-ib.cern.ch/nweek-02534/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_3_X_2018-07-24-1100/lib/slc6_amd64_gcc700/libTrackingToolsDetLayers.so
#7 0x00007fc86d735a15 in GeometricSearchDet::compatibleDets(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&) const () from /cvmfs/cms-ib.cern.ch/nweek-02534/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_3_X_2018-07-24-1100/lib/slc6_amd64_gcc700/libTrackingToolsDetLayers.so
#8 0x00007fc86a9d957a in fastsim::TrackerSimHitProducer::interact(fastsim::Particle&, fastsim::SimplifiedGeometry const&, std::vector<std::unique_ptr<fastsim::Particle, std::default_deletefastsim::Particle >, std::allocator<std::unique_ptr<fastsim::Particle, std::default_deletefastsim::Particle > > >&, RandomEngineAndDistribution const&) () from /cvmfs/cms-ib.cern.ch/nweek-02534/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_3_X_2018-07-24-1100/lib/slc6_amd64_gcc700/pluginFastSimulationSimplifiedGeometryPropagatorAuto.so
#9 0x00007fc86a9f0f29 in FastSimProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/nweek-02534/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_3_X_2018-07-24-1100/lib/slc6_amd64_gcc700/pluginFastSimulationSimplifiedGeometryPropagatorAuto.so

@cmsbuild
Copy link
Contributor

A new Issue was created by @davidlange6 David Lange.

@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@kpedro88
Copy link
Contributor

assign fastsim

@cmsbuild
Copy link
Contributor

New categories assigned: fastsim

@mdhildreth,@ssekmen,@lveldere,@civanch you have been requested to review this Pull request/Issue and eventually sign? Thanks

@kpedro88
Copy link
Contributor

Indeed, I saw it once in one of my PRs (#23703 (comment)) and then it went away. I ran valgrind, but didn't find any memory corruption, just a leak (see #23795).

@smuzaffar
Copy link
Contributor

Looks like this has been fixed. We have not seen such random erros for PR tests.

@makortel
Copy link
Contributor

makortel commented Aug 26, 2020

Here is another similar crash from #31245 (comment) in fastsim test

#3  0x00002acab5296d89 in sig_dostack_then_abort () from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_11_2_X_2020-08-25-1100/lib/slc7_amd64_gcc820/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00002acad3217535 in TIDRing::computeCrossings(TrajectoryStateOnSurface const&, PropagationDirection) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-25-1100/lib/slc7_amd64_gcc820/libRecoTrackerTkDetLayers.so
#6  0x00002acad3217c5c in TIDRing::groupedCompatibleDetsV(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&, std::vector<DetGroup, std::allocator<DetGroup> >&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-25-1100/lib/slc7_amd64_gcc820/libRecoTrackerTkDetLayers.so
#7  0x00002acad3216ac5 in TIDLayer::groupedCompatibleDetsV(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&, std::vector<DetGroup, std::allocator<DetGroup> >&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-25-1100/lib/slc7_amd64_gcc820/libRecoTrackerTkDetLayers.so
#8  0x00002acacb30fce4 in GeometricSearchDet::compatibleDetsV(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&, std::vector<std::pair<GeomDet const*, TrajectoryStateOnSurface>, std::allocator<std::pair<GeomDet const*, TrajectoryStateOnSurface> > >&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-25-1100/lib/slc7_amd64_gcc820/libTrackingToolsDetLayers.so
#9  0x00002acacb30fc7d in GeometricSearchDet::compatibleDets(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&) const () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-25-1100/lib/slc7_amd64_gcc820/libTrackingToolsDetLayers.so
#10 0x00002acadde137c1 in fastsim::TrackerSimHitProducer::interact(fastsim::Particle&, fastsim::SimplifiedGeometry const&, std::vector<std::unique_ptr<fastsim::Particle, std::default_delete<fastsim::Particle> >, std::allocator<std::unique_ptr<fastsim::Particle, std::default_delete<fastsim::Particle> > > >&, RandomEngineAndDistribution const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-08-25-1100/lib/slc7_amd64_gcc820/pluginFastSimulationSimplifiedGeometryPropagatorAuto.so

@smuzaffar Should we consider reopening the issue?

@smuzaffar smuzaffar reopened this Aug 26, 2020
@makortel
Copy link
Contributor

Here is a similar crash from cms-sw/cmsdist#6343 (comment) in fastsim test

#3  0x00002b8197936a59 in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-10-27-1100/lib/slc7_amd64_gcc820/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00002b81b585f472 in TECLayer::computeCrossings(TrajectoryStateOnSurface const&, PropagationDirection) const () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-10-27-1100/lib/slc7_amd64_gcc820/libRecoTrackerTkDetLayers.so
#6  0x00002b81b585fbac in TECLayer::groupedCompatibleDetsV(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&, std::vector<DetGroup, std::allocator<DetGroup> >&) const () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-10-27-1100/lib/slc7_amd64_gcc820/libRecoTrackerTkDetLayers.so
#7  0x00002b81ad8d9b33 in GeometricSearchDet::compatibleDetsV(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&, std::vector<std::pair<GeomDet const*, TrajectoryStateOnSurface>, std::allocator<std::pair<GeomDet const*, TrajectoryStateOnSurface> > >&) const () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-10-27-1100/lib/slc7_amd64_gcc820/libTrackingToolsDetLayers.so
#8  0x00002b81ad8d9acd in GeometricSearchDet::compatibleDets(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&) const () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-10-27-1100/lib/slc7_amd64_gcc820/libTrackingToolsDetLayers.so
#9  0x00002b81be57c731 in fastsim::TrackerSimHitProducer::interact(fastsim::Particle&, fastsim::SimplifiedGeometry const&, std::vector<std::unique_ptr<fastsim::Particle, std::default_delete<fastsim::Particle> >, std::allocator<std::unique_ptr<fastsim::Particle, std::default_delete<fastsim::Particle> > > >&, RandomEngineAndDistribution const&) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-10-27-1100/lib/slc7_amd64_gcc820/pluginFastSimulationSimplifiedGeometryPropagatorAuto.so
#10 0x00002b81be559e44 in FastSimProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-10-27-1100/lib/slc7_amd64_gcc820/pluginFastSimulationSimplifiedGeometryPropagatorAuto.so
#11 0x00002b818894d774 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_X_2020-10-27-1100/lib/slc7_amd64_gcc820/libFWCoreFramework.so

@makortel
Copy link
Contributor

#32152 shows similar stack trace inside GeometricSearchDet::compatibleDets(), but it is called from elsewhere.

@makortel
Copy link
Contributor

makortel commented Feb 2, 2021

Here is a similar crash from #32782 (comment) in fastsim test

Thread 5 (Thread 0x2add69400700 (LWP 18298)):
#3  0x00002adc590337e4 in sig_dostack_then_abort () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/32782/12653/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00002adc79a3eb7e in TECLayer::computeCrossings(TrajectoryStateOnSurface const&, PropagationDirection) const () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libRecoTrackerTkDetLayers.so
#6  0x00002adc79a3f2ec in TECLayer::groupedCompatibleDetsV(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&, std::vector<DetGroup, std::allocator<DetGroup> >&) const () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libRecoTrackerTkDetLayers.so
#7  0x00002adc567efcdb in GeometricSearchDet::compatibleDetsV(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&, std::vector<std::pair<GeomDet const*, TrajectoryStateOnSurface>, std::allocator<std::pair<GeomDet const*, TrajectoryStateOnSurface> > >&) const () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libTrackingToolsDetLayers.so
#8  0x00002adc567f01d1 in GeometricSearchDet::compatibleDets(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&) const () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libTrackingToolsDetLayers.so
#9  0x00002adc838c5ca1 in fastsim::TrackerSimHitProducer::interact(fastsim::Particle&, fastsim::SimplifiedGeometry const&, std::vector<std::unique_ptr<fastsim::Particle, std::default_delete<fastsim::Particle> >, std::allocator<std::unique_ptr<fastsim::Particle, std::default_delete<fastsim::Particle> > > >&, RandomEngineAndDistribution const&) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/pluginFastSimulationSimplifiedGeometryPropagatorAuto.so
#10 0x00002adc838a4359 in FastSimProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/pluginFastSimulationSimplifiedGeometryPropagatorAuto.so
#11 0x00002adc49b8c924 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#12 0x00002adc49b68b4d in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#13 0x00002adc49ac93c5 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so

Thread 4 (Thread 0x2add68422700 (LWP 18297)):
#2  0x00002adc59031df0 in sig_pause_for_stacktrace () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/32782/12653/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002adc7b4b1835 in EcalCoder::encode(CaloTSamples<float, 10u> const&, EcalDataFrame&, CLHEP::HepRandomEngine*) const () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libSimCalorimetryEcalSimAlgos.so
#5  0x00002adc7b4b435c in EcalTDigitizer<EBDigitizerTraits>::run(EBDigiCollection&, CLHEP::HepRandomEngine*) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libSimCalorimetryEcalSimAlgos.so
#6  0x00002adcb46d214e in EcalDigiProducer::finalizeEvent(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libSimCalorimetryEcalSimProducers.so
#7  0x00002adcb454830b in edm::MixingModule::finalizeEvent(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/pluginSimGeneralMixingModulePlugins.so
#8  0x00002adcb45de40b in edm::BMixingModule::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libMixingBase.so
#9  0x00002adc49b8c924 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#10 0x00002adc49b68b4d in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#11 0x00002adc49ac93c5 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#12 0x00002adc49ac957d in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so

Thread 3 (Thread 0x2add67a21700 (LWP 18296)):
#2  0x00002adc59031df0 in sig_pause_for_stacktrace () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/32782/12653/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002adc4adb7b13 in rtree_szind_slab_read_fast (r_slab=<synthetic pointer>, r_szind=<synthetic pointer>, key=47132606273168, rtree_ctx=<optimized out>, rtree=<optimized out>, tsdn=<optimized out>) at include/jemalloc/internal/rtree.h:471
#5  free_fastpath (size_hint=false, size=0, ptr=0x2addea411690) at src/jemalloc.c:2827
#6  free (ptr=0x2addea411690) at src/jemalloc.c:2870
#7  0x00002adc7b4b1705 in EcalCoder::encode(CaloTSamples<float, 10u> const&, EcalDataFrame&, CLHEP::HepRandomEngine*) const () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libSimCalorimetryEcalSimAlgos.so
#8  0x00002adc7b4b435c in EcalTDigitizer<EBDigitizerTraits>::run(EBDigiCollection&, CLHEP::HepRandomEngine*) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libSimCalorimetryEcalSimAlgos.so
#9  0x00002adcb46d214e in EcalDigiProducer::finalizeEvent(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libSimCalorimetryEcalSimProducers.so
#10 0x00002adcb454830b in edm::MixingModule::finalizeEvent(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/pluginSimGeneralMixingModulePlugins.so
#11 0x00002adcb45de40b in edm::BMixingModule::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libMixingBase.so
#12 0x00002adc49b8c924 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#13 0x00002adc49b68b4d in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#14 0x00002adc49ac93c5 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#15 0x00002adc49ac957d in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so

Thread 1 (Thread 0x2adc4df9f3c0 (LWP 17249)):
#2  0x00002adc59031df0 in sig_pause_for_stacktrace () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/32782/12653/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002adc4b284f28 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::process_bypass_loop (this=this@entry=0x2adc4ec7a600, context_guard=..., t=0x2add684c0d40, isolation=isolation@entry=0) at ../../include/tbb/task.h:1003
#5  0x00002adc4b2852fb in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0x2adc4ec7a600, parent=..., child=<optimized out>) at ../../include/tbb/task.h:1003
#6  0x00002adc49a3e625 in edm::EventProcessor::processLumis(std::shared_ptr<void> const&) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#7  0x00002adc49a46b25 in edm::EventProcessor::runToCompletion() () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_X_2021-02-02-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#8  0x000000000040fb86 in tbb::interface7::internal::delegated_function<main::{lambda()#1}::operator()() const::{lambda()#1} const, void>::operator()() const ()
#9  0x00002adc4b27fb92 in tbb::interface7::internal::task_arena_base::internal_execute (this=0x7fffd6b69880, d=...) at ../../src/tbb/arena.cpp:1105
#10 0x0000000000410aa3 in main::{lambda()#1}::operator()() const ()
#11 0x000000000040f6dc in main ()


Current Modules:
Module: FastSimProducer:fastSimProducer (crashed)
Module: MixingModule:mix
Module: none
Module: MixingModule:mix

@makortel
Copy link
Contributor

Here is a similar crash from #36100 (comment) in fastsim1 test

Begin processing the 27th record. Run 1, Event 27, LumiSection 1 on stream 2 at 12-Nov-2021 08:44:02.954 CET


A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

Fri Nov 12 08:44:04 CET 2021
Thread 5 (Thread 0x2ad40ba00700 (LWP 19865) "cmsRun"):
#2  0x00002ad3418403f0 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw-patch/CMSSW_12_2_X_2021-11-11-2300/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002ad3423c6a14 in CLHEP::MixMaxRng::convert1double (this=0x2ad3c13c58d0, u=<optimized out>) at ./CLHEP/Random/MixMaxRng.h:154
#5  CLHEP::MixMaxRng::generate (this=0x2ad3c13c58d0, i=6) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_1_0_pre3-slc7_amd64_gcc900/build/CMSSW_12_1_0_pre3-build/BUILD/slc7_amd64_gcc900/external/clhep/2.4.5.1-0f520ff8878ead52d387082ff8c4c011/clhep-2.4.5.1/Random/src/MixMaxRng.cc:293
#6  0x00002ad3423d7956 in CLHEP::RandGaussQ::shoot (anotherEngine=0x2ad3c13c58d0) at ./CLHEP/Random/RandGaussQ.icc:50
#7  CLHEP::RandGaussQ::shoot (stdDev=1, mean=0, anotherEngine=0x2ad3c13c58d0) at ./CLHEP/Random/RandGaussQ.icc:59
#8  CLHEP::RandGaussQ::shootArray (anEngine=0x2ad3c13c58d0, size=<optimized out>, vect=<optimized out>, mean=0, stdDev=1) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_1_0_pre3-slc7_amd64_gcc900/build/CMSSW_12_1_0_pre3-build/BUILD/slc7_amd64_gcc900/external/clhep/2.4.5.1-0f520ff8878ead52d387082ff8c4c011/clhep-2.4.5.1/Random/src/RandGaussQ.cc:49
#9  0x00002ad36717f1ad in void CorrelatedNoisifier<ROOT::Math::SMatrix<double, 10u, 10u, ROOT::Math::MatRepSym<double, 10u> > >::noisify<CaloSamples>(CaloSamples&, CLHEP::HepRandomEngine*, std::vector<double, std::allocator<double> > const*) const () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libSimCalorimetryEcalSimAlgos.so
#10 0x00002ad36717e1b5 in EcalCoder::encode(CaloTSamples<float, 10u> const&, EcalDataFrame&, CLHEP::HepRandomEngine*) const () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libSimCalorimetryEcalSimAlgos.so
#11 0x00002ad3671811dc in EcalTDigitizer<EEDigitizerTraits>::run(EEDigiCollection&, CLHEP::HepRandomEngine*) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libSimCalorimetryEcalSimAlgos.so
#12 0x00002ad399a995a8 in EcalDigiProducer::finalizeEvent(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libSimCalorimetryEcalSimProducers.so
#13 0x00002ad3998efdbb in edm::MixingModule::finalizeEvent(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/pluginSimGeneralMixingModulePlugins.so
#14 0x00002ad39998a26b in edm::BMixingModule::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libMixingBase.so
#15 0x00002ad338a19abc in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#16 0x00002ad3389fa80d in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#17 0x00002ad338954bb5 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#18 0x00002ad338954d6d in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#19 0x00002ad338955076 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#20 0x00002ad338957406 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so

Thread 4 (Thread 0x2ad40aaba700 (LWP 19863) "cmsRun"):
#3  0x00002ad3418447ab in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw-patch/CMSSW_12_2_X_2021-11-11-2300/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00002ad36579e8e4 in TIBRing::computeCrossings(TrajectoryStateOnSurface const&, PropagationDirection) const () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libRecoTrackerTkDetLayers.so
#6  0x00002ad36579f7eb in TIBRing::groupedCompatibleDetsV(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&, std::vector<DetGroup, std::allocator<DetGroup> >&) const () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libRecoTrackerTkDetLayers.so
#7  0x00002ad36578b35f in CompatibleDetToGroupAdder::add(GeometricSearchDet const&, TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&, std::vector<DetGroup, std::allocator<DetGroup> >&) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libRecoTrackerTkDetLayers.so
#8  0x00002ad36579bee9 in TBLayer::groupedCompatibleDetsV(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&, std::vector<DetGroup, std::allocator<DetGroup> >&) const () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libRecoTrackerTkDetLayers.so
#9  0x00002ad3422b3cdb in GeometricSearchDet::compatibleDetsV(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&, std::vector<std::pair<GeomDet const*, TrajectoryStateOnSurface>, std::allocator<std::pair<GeomDet const*, TrajectoryStateOnSurface> > >&) const () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libTrackingToolsDetLayers.so
#10 0x00002ad3422b41d1 in GeometricSearchDet::compatibleDets(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&) const () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libTrackingToolsDetLayers.so
#11 0x00002ad3812a2c91 in fastsim::TrackerSimHitProducer::interact(fastsim::Particle&, fastsim::SimplifiedGeometry const&, std::vector<std::unique_ptr<fastsim::Particle, std::default_delete<fastsim::Particle> >, std::allocator<std::unique_ptr<fastsim::Particle, std::default_delete<fastsim::Particle> > > >&, RandomEngineAndDistribution const&) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/pluginFastSimulationSimplifiedGeometryPropagatorAuto.so
#12 0x00002ad38127f4f0 in FastSimProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/pluginFastSimulationSimplifiedGeometryPropagatorAuto.so
#13 0x00002ad338a19abc in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#14 0x00002ad3389fa80d in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#15 0x00002ad338954bb5 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#16 0x00002ad338954d6d in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#17 0x00002ad338955076 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#18 0x00002ad338957406 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so

Thread 3 (Thread 0x2ad40a0b9700 (LWP 19860) "cmsRun"):
#2  0x00002ad3418403f0 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw-patch/CMSSW_12_2_X_2021-11-11-2300/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002ad33aaba810 in __cos_avx () from /lib64/libm.so.6
#5  0x00002ad33aa79f7e in sincos () from /lib64/libm.so.6
#6  0x00002ad39fd084d8 in MultiTrackValidator::dqmAnalyze(edm::Event const&, edm::EventSetup const&, MultiTrackValidatorHistograms const&) const () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/pluginValidationRecoTrackPlugins.so
#7  0x00002ad3389ff5f7 in edm::global::EDProducerBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#8  0x00002ad3389fa30d in edm::WorkerT<edm::global::EDProducerBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#9  0x00002ad338954bb5 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#10 0x00002ad338954d6d in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#11 0x00002ad338955076 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#12 0x00002ad338957406 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so

Thread 1 (Thread 0x2ad33d1050c0 (LWP 18432) "cmsRun"):
#2  0x00002ad3418403f0 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw-patch/CMSSW_12_2_X_2021-11-11-2300/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002ad3666c88b7 in EcalElectronicsMapping::getTriggerElectronicsId(DetId const&) const () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libGeometryEcalMapping.so
#5  0x00002ad4091bdcda in EcalFenixStrip::process(std::vector<EEDataFrame, std::allocator<EEDataFrame> >&, int, std::vector<int, std::allocator<int> >&) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libSimCalorimetryEcalTrigPrimAlgos.so
#6  0x00002ad4091cd2bf in void EcalTrigPrimFunctionalAlgo::run_part2<EEDigiCollection>(EEDigiCollection const*, std::vector<std::vector<std::pair<int, std::vector<EEDigiCollection::Digi, std::allocator<EEDigiCollection::Digi> > >, std::allocator<std::pair<int, std::vector<EEDigiCollection::Digi, std::allocator<EEDigiCollection::Digi> > > > >, std::allocator<std::vector<std::pair<int, std::vector<EEDigiCollection::Digi, std::allocator<EEDigiCollection::Digi> > >, std::allocator<std::pair<int, std::vector<EEDigiCollection::Digi, std::allocator<EEDigiCollection::Digi> > > > > > >&, edm::SortedCollection<EcalTriggerPrimitiveDigi, edm::StrictWeakOrdering<EcalTriggerPrimitiveDigi> >&, edm::SortedCollection<EcalTriggerPrimitiveDigi, edm::StrictWeakOrdering<EcalTriggerPrimitiveDigi> >&) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libSimCalorimetryEcalTrigPrimAlgos.so
#7  0x00002ad409133254 in EcalTrigPrimProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/pluginSimCalorimetryEcalTrigPrimProducersPlugins.so
#8  0x00002ad338a19abc in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#9  0x00002ad3389fa80d in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#10 0x00002ad338954bb5 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#11 0x00002ad338954d6d in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#12 0x00002ad338955076 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#13 0x00002ad338957406 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#14 0x00002ad3386cd2b5 in tbb::detail::d1::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreConcurrency.so
#15 0x00002ad33a1663ff in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x2ad33dded380) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_1_0_pre5-slc7_amd64_gcc900/build/CMSSW_12_1_0_pre5-build/BUILD/slc7_amd64_gcc900/external/tbb/v2021.4.0-651a6efca0c94b3c25e36a8faa72480b/tbb-v2021.4.0/src/tbb/task_dispatcher.h:322
#16 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x2ad33dded380) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_1_0_pre5-slc7_amd64_gcc900/build/CMSSW_12_1_0_pre5-build/BUILD/slc7_amd64_gcc900/external/tbb/v2021.4.0-651a6efca0c94b3c25e36a8faa72480b/tbb-v2021.4.0/src/tbb/task_dispatcher.h:463
#17 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_1_0_pre5-slc7_amd64_gcc900/build/CMSSW_12_1_0_pre5-build/BUILD/slc7_amd64_gcc900/external/tbb/v2021.4.0-651a6efca0c94b3c25e36a8faa72480b/tbb-v2021.4.0/src/tbb/task_dispatcher.cpp:168
#18 0x00002ad3388c062f in edm::EventProcessor::processLumis(std::shared_ptr<void> const&) () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#19 0x00002ad3388cc445 in edm::EventProcessor::runToCompletion() () from /cvmfs/cms-ib.cern.ch/nweek-02706/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_2_X_2021-11-11-1100/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#20 0x000000000040bae6 in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#21 0x00002ad33a179c6d in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_1_0_pre5-slc7_amd64_gcc900/build/CMSSW_12_1_0_pre5-build/BUILD/slc7_amd64_gcc900/external/tbb/v2021.4.0-651a6efca0c94b3c25e36a8faa72480b/tbb-v2021.4.0/src/tbb/arena.cpp:698
#22 0x000000000040ca58 in main::{lambda()#1}::operator()() const ()
#23 0x000000000040b62c in main ()

Current Modules:

Module: FastSimProducer:fastSimProducer (crashed)
Module: MultiTrackValidator:trackValidator
Module: MixingModule:mix
Module: EcalTrigPrimProducer:simEcalTriggerPrimitiveDigis

@makortel
Copy link
Contributor

(@smuzaffar Any idea why the fastsim-pending label was removed?)

@smuzaffar
Copy link
Contributor

@makortel , I think I know the reason but in order to confirm I need to remove your comment assign fastsim. Let me remove it and check if bot still keeps the label of not

@cms-sw cms-sw deleted a comment from makortel Jan 20, 2023
@smuzaffar
Copy link
Contributor

smuzaffar commented Jan 20, 2023

@makortel , old bot, which was not keeping track of L2's tenures, removed the lable as @kpedro88 L2 tenure ended on 1st Sep 2021. At that point old bot did not recognize #24051 (comment) and remove the lable when #24051 (comment) was added on 21st NOv 2021.

New bot properly keeps track of L2's tenures and it treats #24051 (comment) as valid comment and keeps the label.

@makortel
Copy link
Contributor

Thanks @smuzaffar for the forensic analysis :)

@makortel
Copy link
Contributor

Just to add, I got across this old issue because of an e-mail from @sarafiorendi that similar errors are apparently happening in production.

@sarafiorendi
Copy link
Contributor

hi, yes indeed I'm running into issues when running the LHEGS step of some FastSim samples.
Jobs are failing with segmentation violation, which sometimes returns informing messages (e.g. [1], [2]), sometimes does not. I've copied some of the full logs at
https://fiorendi.web.cern.ch/fiorendi/mc_fastsim/

The crashes occur also when executed locally, not always at the same event being run. They can be "reproduced" with the setup/running information from [3] or [4]. The same samples were generated successfully in the FullSim campaign, so I tend to think it's something fastSim specific.

[1] Module: FastSimProducer:fastSimProducer (crashed)
Module: TrajectorySeedProducer:detachedQuadStepSeeds
A fatal system signal has occurred: segmentation violation

[2] Module: FastSimProducer:fastSimProducer (crashed)
Module: EleIsoDetIdCollectionProducer:interestingGedEleIsoDetIdEB
Module: PrimaryVertexProducer:unsortedOfflinePrimaryVertices
Module: GsfElectronEcalDrivenProducer:ecalDrivenGsfElectrons
A fatal system signal has occurred: segmentation violation

[3] https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_test/SUS-RunIISpring22UL18FSwmLHEGSPremix-00005
[4] https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_test/SUS-RunIISpring22UL17FSwmLHEGSPremix-00005

@makortel
Copy link
Contributor

Let's see if tagging @cms-sw/fastsim-l2 @cms-sw/trk-dpg-l2 @cms-sw/geometry-l2 helps forward

@sbein
Copy link
Contributor

sbein commented Jan 24, 2023

This issue was discussed SIM meeting [1], and a likely issue is the increased memory consumption of FastSim. We've seen marginally worsening RSS over time, particularly when running with multi-threading [2]. Kevin Pedro recommended we could run Valgrind on this recipe to look for memory leaks or other problems. It was also suggested at one point to create a dedicated issue for the non-optimal memory topic. I wonder if there's an experienced person (or failing that, a twiki with Valgrind/cmssw or VTune documentation) to help try to solve this.

[1] https://indico.cern.ch/event/1236460/contributions/5231552/attachments/2579663/4448912/FastSimNewsJan2023.pdf
[2] https://indico.cern.ch/event/1191657/contributions/5016859/attachments/2526966/4347277/Workshop_FastSim_computing_performance_Krammer_v2_12Oct2022.pdf

@makortel
Copy link
Contributor

I can't really think of how a memory exhaustion could lead to a segmentation fault. Typical symptoms for memory exhaustion are std::bac_alloc exception, or the job gets killed by an external watchdog process. Typical causes for segfaults are incorrect memory reads or memory getting corrupted (i.e. something writes into a wrong place in memory).

@makortel
Copy link
Contributor

makortel commented Apr 12, 2023

Crash in #41282 (comment) in fastsim1 test

Begin processing the 25th record. Run 1, Event 25, LumiSection 1 on stream 2 at 12-Apr-2023 10:27:42.899 CEST

Thread 5 (Thread 0x2b74b2e00700 (LWP 18515) "cmsRun"):
#3  <signal handler called>
#4  0x00002b7489aceb7a in SteppingHelixPropagator::cIndex_(int) const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/libTrackPropagationSteppingHelixPropagator.so
#5  0x00002b7489ad64dd in SteppingHelixPropagator::propagate(SteppingHelixStateInfo const&, Plane const&, SteppingHelixStateInfo&) const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/libTrackPropagationSteppingHelixPropagator.so
#6  0x00002b7489aaddbd in TrackDetectorAssociator::fillMuon(edm::Event const&, TrackDetMatchInfo&, TrackAssociatorParameters const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/libTrackingToolsTrackAssociator.so
#7  0x00002b7489ab2936 in TrackDetectorAssociator::associate(edm::Event const&, edm::EventSetup const&, TrackAssociatorParameters const&, FreeTrajectoryState const*, FreeTrajectoryState const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/libTrackingToolsTrackAssociator.so
#8  0x00002b7489ab31b9 in TrackDetectorAssociator::associate(edm::Event const&, edm::EventSetup const&, reco::Track const&, TrackAssociatorParameters const&, TrackDetectorAssociator::Direction) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/libTrackingToolsTrackAssociator.so
#9  0x00002b74f1fca2ae in MuonIdProducer::fillMuonId(edm::Event&, edm::EventSetup const&, reco::Muon&, TrackDetectorAssociator::Direction) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/pluginRecoMuonMuonIdentificationPlugins.so
#10 0x00002b74f1fccc79 in MuonIdProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/pluginRecoMuonMuonIdentificationPlugins.so
#11 0x00002b7457ee8ccd in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/libFWCoreFramework.so
#12 0x00002b7457ec7092 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/libFWCoreFramework.so

Thread 4 (Thread 0x2b74b1ee6700 (LWP 18514) "cmsRun"):
#2  0x00002b745e909a30 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002b74645c2e50 in TH1::GetDimension() const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-04-11-2300/external/el8_amd64_gcc11/lib/libHist.so
#5  0x00002b745e7a4991 in dqm::impl::MonitorElement::accessRootObject(dqm::impl::AccessMut const&, char const*, int) const () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/41282/31928/CMSSW_13_1_X_2023-04-11-2300/lib/el8_amd64_gcc11/libDQMServicesCore.so
#6  0x00002b745e7a1167 in dqm::impl::MonitorElement::doFill(long) () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/41282/31928/CMSSW_13_1_X_2023-04-11-2300/lib/el8_amd64_gcc11/libDQMServicesCore.so
#7  0x00002b74d0c31aa7 in MTVHistoProducerAlgoForTracker::fill_simAssociated_recoTrack_histos(MTVHistoProducerAlgoForTrackerHistograms const&, int, reco::Track const&) const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/libValidationRecoTrack.so
#8  0x00002b74d0ad9e13 in MultiTrackValidator::dqmAnalyze(edm::Event const&, edm::EventSetup const&, MultiTrackValidatorHistograms const&) const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/pluginValidationRecoTrackPlugins.so
#9  0x00002b7457ecddfb in edm::global::EDProducerBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/libFWCoreFramework.so

Thread 3 (Thread 0x2b74b14e5700 (LWP 18513) "cmsRun"):
#2  0x00002b745e909a30 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002b745902ff26 in imalloc_fastpath (fallback_alloc=0x2b745902feb0 <fallback_impl<false>(std::size_t)>, size=48) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:292
#5  newImpl<false> (size=48) at src/jemalloc_cpp.cpp:109
#6  operator new (size=48) at src/jemalloc_cpp.cpp:114
#7  0x00002b7457fd742e in edm::ErrorObj::emitToken(std::basic_string_view<char, std::char_traits<char> >) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/libFWCoreMessageLogger.so
#8  0x00002b7457fd9da2 in edm::ErrorObj::opltlt(char const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/libFWCoreMessageLogger.so
#9  0x00002b74836b0169 in EcalTrigTowerConstituentsMap::towerOf(DetId const&) const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/libGeometryCaloTopology.so
#10 0x00002b74fc3e81a1 in EcalSelectiveReadout::runSelectiveReadout0(EcalSelectiveReadout::ttFlag_t const (*) [72]) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/libSimCalorimetryEcalSelectiveReadoutAlgos.so
#11 0x00002b74fc3ea1b4 in EcalSelectiveReadoutSuppressor::run(edm::EventSetup const&, edm::SortedCollection<EcalTriggerPrimitiveDigi, edm::StrictWeakOrdering<EcalTriggerPrimitiveDigi> > const&, EBDigiCollection const&, EEDigiCollection const&, EBDigiCollection*, EEDigiCollection*, edm::SortedCollection<EBSrFlag, edm::StrictWeakOrdering<EBSrFlag> >*, edm::SortedCollection<EESrFlag, edm::StrictWeakOrdering<EESrFlag> >*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/libSimCalorimetryEcalSelectiveReadoutAlgos.so
#12 0x00002b74fc3c4966 in EcalSelectiveReadoutProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/libSimCalorimetryEcalSelectiveReadoutProducers.so
#13 0x00002b7457eddfbe in edm::one::EDProducerBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/libFWCoreFramework.so
#14 0x00002b7457ec5662 in edm::WorkerT<edm::one::EDProducerBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/libFWCoreFramework.so

Thread 1 (Thread 0x2b745b3681c0 (LWP 18056) "cmsRun"):
#3  0x00002b745e90ce9b in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00002b7488ff44f5 in TBLayer::groupedCompatibleDetsV(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&, std::vector<DetGroup, std::allocator<DetGroup> >&) const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/libRecoTrackerTkDetLayers.so
#6  0x00002b745fe862fb in GeometricSearchDet::compatibleDetsV(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&, std::vector<std::pair<GeomDet const*, TrajectoryStateOnSurface>, std::allocator<std::pair<GeomDet const*, TrajectoryStateOnSurface> > >&) const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/libTrackingToolsDetLayers.so
#7  0x00002b745fe856d4 in GeometricSearchDet::compatibleDets(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&) const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/libTrackingToolsDetLayers.so
#8  0x00002b74b685e279 in fastsim::TrackerSimHitProducer::interact(fastsim::Particle&, fastsim::SimplifiedGeometry const&, std::vector<std::unique_ptr<fastsim::Particle, std::default_delete<fastsim::Particle> >, std::allocator<std::unique_ptr<fastsim::Particle, std::default_delete<fastsim::Particle> > > >&, RandomEngineAndDistribution const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/pluginFastSimulationSimplifiedGeometryPropagatorAuto.so
#9  0x00002b74b684572e in FastSimProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/pluginFastSimulationSimplifiedGeometryPropagatorAuto.so
#10 0x00002b7457ee8ccd in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/libFWCoreFramework.so
#11 0x00002b7457ec7092 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02780/el8_amd64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-10-2300/lib/el8_amd64_gcc11/libFWCoreFramework.so


Current Modules:
Module: FastSimProducer:fastSimProducer (crashed)
Module: EcalSelectiveReadoutProducer:simEcalDigis
Module: MultiTrackValidator:trackValidatorFromPVAllTP
Module: MuonIdProducer:muons1stStep

@vhegde91
Copy link

When you start executing this script, it repeatedly crashed after this
Begin processing the 793rd record. Run 1, Event 793, LumiSection 2 on stream 0 at 05-Apr-2023 20:34:26.408 CEST

The sample request fails validation and it is stuck for several months.

@makortel
Copy link
Contributor

When you start executing this script, it repeatedly crashed after this
Begin processing the 793rd record. Run 1, Event 793, LumiSection 2 on stream 0 at 05-Apr-2023 20:34:26.408 CEST

I'm not able to reproduce. My long test got actually stuck(?) in Pythia8 in event 6001 (or I stopped the test after 9 hours within the event, the last stack trace was

#0  0x00007ffff53980dd in __mul () from /lib64/libm.so.6
#1  0x00007ffff5399561 in __c32 () from /lib64/libm.so.6
#2  0x00007ffff53993d1 in __mptan () from /lib64/libm.so.6
#3  0x00007ffff53bee31 in __tan_avx () from /lib64/libm.so.6
#4  0x00007fffd05795ce in Pythia8::ParticleDataEntry::mSel() ()
   from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_30/external/slc7_amd64_gcc700/lib/libpythia8.so
#5  0x00007fffd0656201 in Pythia8::ResonanceDecays::pickMasses() ()
   from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_30/external/slc7_amd64_gcc700/lib/libpythia8.so
#6  0x00007fffd065a31e in Pythia8::ResonanceDecays::next(Pythia8::Event&, int) ()
   from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_30/external/slc7_amd64_gcc700/lib/libpythia8.so
#7  0x00007fffd05d6a6d in Pythia8::ProcessContainer::decayResonances(Pythia8::Event&) ()
   from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_30/external/slc7_amd64_gcc700/lib/libpythia8.so
#8  0x00007fffd06112e4 in Pythia8::ProcessLevel::nextOne(Pythia8::Event&) ()
   from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_30/external/slc7_amd64_gcc700/lib/libpythia8.so
#9  0x00007fffd061350d in Pythia8::ProcessLevel::next(Pythia8::Event&) ()
   from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_30/external/slc7_amd64_gcc700/lib/libpythia8.so
#10 0x00007fffd06270cc in Pythia8::Pythia::next() ()
   from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_30/external/slc7_amd64_gcc700/lib/libpythia8.so
#11 0x00007fffc9b13a0c in Pythia8Hadronizer::generatePartonsAndHadronize() ()
   from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_30/lib/slc7_amd64_gcc700/pluginGeneratorInterfacePythia8Filters.so
#12 0x00007fffc9b50c07 in edm::GeneratorFilter<Pythia8Hadronizer, gen::ExternalDecayDriver>::filter(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_30/lib/slc7_amd64_gcc700/pluginGeneratorInterfacePythia8Filters.so
#13 0x00007ffff7d347f6 in edm::one::EDFilterBase::doEvent(edm::EventPrincipal const&, edm::EventSetupImpl const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) ()
   from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_30/lib/slc7_amd64_gcc700/libFWCoreFramework.so

@vhegde91 Can you give pointers to logs of the crashes or copy the stack trace of a crash here?

@vhegde91
Copy link

@makortel , here is the log that I got after running on lxplus727: https://vhegde.web.cern.ch/vhegde/MCsampleTests/lxplus727_T5WG_SUS-RunIISpring21UL16FSGSPremixLLPBugFix-00031_v1.log This is what I copied from the terminal print out. (I did not include the first few lines).
However, source SUS-RunIISpring21UL16FSGSPremixLLPBugFix-00031 |tee logFile.txt gives less information that is here: https://vhegde.web.cern.ch/vhegde/MCsampleTests/lxplus727_T5WG_SUS-RunIISpring21UL16FSGSPremixLLPBugFix-00031.log

@makortel
Copy link
Contributor

Thanks, the stack trace is indeed related

Thread 1 (Thread 0x7f9be62fa480 (LWP 26217)):
#3  0x00007f9be00c5ec8 in sig_dostack_then_abort () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_30/lib/slc7_amd64_gcc700/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00007f9bc5eaacee in TECLayer::computeCrossings(TrajectoryStateOnSurface const&, PropagationDirection) const () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_30/lib/slc7_amd64_gcc700/libRecoTrackerTkDetLayers.so
#6  0x00007f9bc5eab43c in TECLayer::groupedCompatibleDetsV(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&, std::vector<DetGroup, std::allocator<DetGroup> >&) const () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_30/lib/slc7_amd64_gcc700/libRecoTrackerTkDetLayers.so
#7  0x00007f9bc5e26a84 in GeometricSearchDet::compatibleDetsV(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&, std::vector<std::pair<GeomDet const*, TrajectoryStateOnSurface>, std::allocator<std::pair<GeomDet const*, TrajectoryStateOnSurface> > >&) const () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_30/lib/slc7_amd64_gcc700/libTrackingToolsDetLayers.so
#8  0x00007f9bc5e26a05 in GeometricSearchDet::compatibleDets(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&) const () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_30/lib/slc7_amd64_gcc700/libTrackingToolsDetLayers.so
#9  0x00007f9bc49908bc in fastsim::TrackerSimHitProducer::interact(fastsim::Particle&, fastsim::SimplifiedGeometry const&, std::vector<std::unique_ptr<fastsim::Particle, std::default_delete<fastsim::Particle> >, std::allocator<std::unique_ptr<fastsim::Particle, std::default_delete<fastsim::Particle> > > >&, RandomEngineAndDistribution const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_30/lib/slc7_amd64_gcc700/pluginFastSimulationSimplifiedGeometryPropagatorAuto.so
#10 0x00007f9bc49b261f in FastSimProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_30/lib/slc7_amd64_gcc700/pluginFastSimulationSimplifiedGeometryPropagatorAuto.so
#11 0x00007f9beab769c7 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventPrincipal const&, edm::EventSetupImpl const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_30/lib/slc7_amd64_gcc700/libFWCoreFramework.so

Current Modules:
Module: FastSimProducer:fastSimProducer (crashed)

@vhegde91
Copy link

vhegde91 commented May 2, 2023

Hi all,
Any update on this? We have several fastsim requests that are on hold because of this issue.

@sbein
Copy link
Contributor

sbein commented May 12, 2023

Hi @ALL I would like to investigate this further but I am not able to reproduce the crash - I'm working on lxplus727 and I can run the script, the last command being

cmsDriver.py Configuration/GenProduction/python/SUS-RunIISpring21UL16FSGSPremixLLPBugFix-00031-fragment.py --python_filename SUS-RunIISpring21UL16FSGSPremixLLPBugFix-00031_1_cfg.py --eventcontent AODSIM --customise Configuration/DataProcessing/Utils.addMonitoring --datatier AODSIM --fileout file:SUS-RunIISpring21UL16FSGSPremixLLPBugFix-00031.root --pileup_input "dbs:/Neutrino_E-10_gun/RunIIFall17FSPrePremix-PUFSUL16CP5_106X_mcRun2_asymptotic_v16-v1/PREMIX" --conditions 106X_mcRun2_asymptotic_v17 --beamspot Realistic25ns13TeV2016Collision --customise_commands "process.source.numberEventsInLuminosityBlock = cms.untracked.uint32(200)"\\nprocess.source.numberEventsInLuminosityBlock="cms.untracked.uint32(400)" --step GEN,SIM,RECOBEFMIX,DIGI,DATAMIX,L1,DIGI2RAW,L1Reco,RECO --procModifiers premix_stage2,fastSimFixLongLivedBug --datamix PreMix --era Run2_2016 --fast --no_exec --mc -n 5000

and it processes all 5k events. Is the issue there are requests stuck in an official test or was this a private unit test that crashed after 739 events? Any tips for how I could reproduce the crash would be helpful.

@sbein
Copy link
Contributor

sbein commented Oct 6, 2023

Hi @davidlange6, I think this can be closed, since the problem at least I believe was solved.

@makortel
Copy link
Contributor

makortel commented Oct 9, 2023

I think this can be closed, since the problem at least I believe was solved.

Could you then sign the issue?

@civanch
Copy link
Contributor

civanch commented Oct 10, 2023

+1

@cmsbuild
Copy link
Contributor

This issue is fully signed and ready to be closed.

@makortel
Copy link
Contributor

@cmsbuild, please close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants