Segmentation violation in PromptReco for FastjetJetProducer:ak4PFJets #41397

malbouis · 2023-04-24T21:21:59Z

There is one job failing Reco for Run 366451, dataset ParkingDoubleElectronLowMass, with a segmentation violation, as described in https://cms-talk.web.cern.ch/t/segmentation-error-in-promptreco-for-run-366451-dataset-parkingdoubleelectronlowmass/23152

The crash seems to be from module FastjetJetProducer:

%MSG-w TrackProducerBase:  TrackRefitter:hltTrackRefitterForSiStripMonitorTrack  24-Apr-2023 18:58:38 CEST Run: 366451 Event: 418574346
 BeamSpot is not valid
%MSG
%MSG-e TrackRefitter:  TrackRefitter:hltTrackRefitterForSiStripMonitorTrack  24-Apr-2023 18:58:38 CEST Run: 366451 Event: 418574346
 BeamSpot is (0,0,0), it is probably because is not valid in the event
%MSG

A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

...

Current Modules:

Module: FastjetJetProducer:ak4PFJets (crashed)
Module: MultiHitFromChi2EDProducer:pixelLessStepHitTriplets
Module: PFClusterProducer:particleFlowClusterHBHE
Module: RecHitTask:recHitTask
Module: TrackProducer:mixedTripletStepTracks
Module: MuonIdProducer:muons1stStep
Module: TrackProducer:initialStepTracks
Module: CAHitQuadrupletEDProducer:detachedQuadStepHitQuadruplets

A fatal system signal has occurred: segmentation violation

The full log is at /afs/cern.ch/user/c/cmst0/public/PausedJobs/Run2023B/job_248341/job/WMTaskSpace/cmsRun1 as described in the original email.

I was able to reproduce the failure locally.

The text was updated successfully, but these errors were encountered:

cmsbuild · 2023-04-24T21:22:19Z

A new Issue was created by @malbouis .

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

malbouis · 2023-04-24T21:22:33Z

assign reconstruction

makortel · 2023-04-24T21:31:14Z

Full stack trace from the log

Thread 9 (Thread 0x2b854cc00700 (LWP 656) "cmsRun"):
#3  0x00002b8503ef333b in sig_dostack_then_abort () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  fastjet::LazyTiling9::_tj_set_jetinfo (_jets_index=1409, jet=0x2b888c4f9688, this=0x2b854cbf8cd0) at LazyTiling9.cc:258
#6  fastjet::LazyTiling9::run (this=this@entry=0x2b854cbf8cd0) at LazyTiling9.cc:509
#7  0x00002b854bf863e4 in fastjet::ClusterSequence::_initialise_and_run_no_decant (this=0x2b854cbf9050) at ClusterSequence.cc:412
#8  0x00002b854bf09d9c in fastjet::ClusterSequenceActiveAreaExplicitGhosts::_initialise<fastjet::PseudoJet> (this=0x2b854cbf9050, pseudojets=..., jet_def_in=..., ghost_spec=<optimized out>, ghosts=<optimized out>, ghost_area=<optimized out>, writeout_combinations=@0x2b854cbf8fbf: false) at ./../include/fastjet/ClusterSequenceActiveAreaExplicitGhosts.hh:224
#9  0x00002b854bfb1e72 in fastjet::ClusterSequenceActiveAreaExplicitGhosts::ClusterSequenceActiveAreaExplicitGhosts<fastjet::PseudoJet> (writeout_combinations=@0x2b854cbf8fbf: false, ghost_spec=..., jet_def_in=..., pseudojets=..., this=0x2b854cbf9050) at ./../include/fastjet/ClusterSequenceActiveAreaExplicitGhosts.hh:69
#10 fastjet::ClusterSequenceActiveArea::_run_AA (this=0x2b888aea0800, ghost_spec=...) at ClusterSequenceActiveArea.cc:133
#11 0x00002b854bfb215b in fastjet::ClusterSequenceActiveArea::_initialise_and_run_AA (this=0x2b888aea0800, jet_def_in=..., ghost_spec=..., writeout_combinations=<optimized out>) at ClusterSequenceActiveArea.cc:61
#12 0x00002b85758bde3c in void fastjet::ClusterSequenceArea::initialize_and_run_cswa<fastjet::PseudoJet>(std::vector<fastjet::PseudoJet, std::allocator<fastjet::PseudoJet> > const&, fastjet::JetDefinition const&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginRecoJetsJetProducers_plugins.so
#13 0x00002b85758bea51 in fastjet::ClusterSequenceArea::ClusterSequenceArea<fastjet::PseudoJet>(std::vector<fastjet::PseudoJet, std::allocator<fastjet::PseudoJet> > const&, fastjet::JetDefinition const&, fastjet::AreaDefinition const&) [clone .lto_priv.0] () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginRecoJetsJetProducers_plugins.so
#14 0x00002b85758d82f6 in FastjetJetProducer::runAlgorithm(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginRecoJetsJetProducers_plugins.so
#15 0x00002b8575916a16 in VirtualJetProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginRecoJetsJetProducers_plugins.so
#16 0x00002b85758d32dd in FastjetJetProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginRecoJetsJetProducers_plugins.so
#17 0x00002b84fb5fa95d in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libFWCoreFramework.so
#18 0x00002b84fb5e1072 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libFWCoreFramework.so


Thread 8 (Thread 0x2b854ba00700 (LWP 655) "cmsRun"):
#2  0x00002b8503eefed0 in sig_pause_for_stacktrace () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002b852842377f in HelixForwardPlaneCrossing::position(double) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libTrackingToolsGeomPropagators.so
#5  0x00002b852ea2120e in CompositeTECWedge::groupedCompatibleDetsV(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&, std::vector<DetGroup, std::allocator<DetGroup> >&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libRecoTrackerTkDetLayers.so
#6  0x00002b852ea216cb in CompatibleDetToGroupAdder::add(GeometricSearchDet const&, TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&, std::vector<DetGroup, std::allocator<DetGroup> >&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libRecoTrackerTkDetLayers.so
#7  0x00002b852ea224d2 in CompositeTECPetal::groupedCompatibleDetsV(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&, std::vector<DetGroup, std::allocator<DetGroup> >&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libRecoTrackerTkDetLayers.so
#8  0x00002b852ea216cb in CompatibleDetToGroupAdder::add(GeometricSearchDet const&, TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&, std::vector<DetGroup, std::allocator<DetGroup> >&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libRecoTrackerTkDetLayers.so
#9  0x00002b852ea2b3fa in TECLayer::groupedCompatibleDetsV(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&, std::vector<DetGroup, std::allocator<DetGroup> >&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libRecoTrackerTkDetLayers.so
#10 0x00002b850a0302fb in GeometricSearchDet::compatibleDetsV(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&, std::vector<std::pair<GeomDet const*, TrajectoryStateOnSurface>, std::allocator<std::pair<GeomDet const*, TrajectoryStateOnSurface> > >&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libTrackingToolsDetLayers.so
#11 0x00002b850a02f6d4 in GeometricSearchDet::compatibleDets(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libTrackingToolsDetLayers.so
#12 0x00002b856a834b1f in TrackProducerBase<reco::Track>::setSecondHitPattern(Trajectory*, reco::Track&, Propagator const*, MeasurementTrackerEvent const*, TrackerTopology const*)::{lambda(std::vector<DetLayer const*, std::allocator<DetLayer const*> > const&, TrajectoryStateOnSurface const&)#1}::operator()(std::vector<DetLayer const*, std::allocator<DetLayer const*> > const&, TrajectoryStateOnSurface const&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libRecoTrackerTrackProducer.so
#13 0x00002b856a835060 in TrackProducerBase<reco::Track>::setSecondHitPattern(Trajectory*, reco::Track&, Propagator const*, MeasurementTrackerEvent const*, TrackerTopology const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libRecoTrackerTrackProducer.so
#14 0x00002b856a837e85 in KfTrackProducerBase::putInEvt(edm::Event&, Propagator const*, MeasurementTracker const*, std::unique_ptr<edm::OwnVector<TrackingRecHit, edm::ClonePolicy<TrackingRecHit> >, std::default_delete<edm::OwnVector<TrackingRecHit, edm::ClonePolicy<TrackingRecHit> > > >&, std::unique_ptr<std::vector<reco::Track, std::allocator<reco::Track> >, std::default_delete<std::vector<reco::Track, std::allocator<reco::Track> > > >&, std::unique_ptr<std::vector<reco::TrackExtra, std::allocator<reco::TrackExtra> >, std::default_delete<std::vector<reco::TrackExtra, std::allocator<reco::TrackExtra> > > >&, std::unique_ptr<std::vector<Trajectory, std::allocator<Trajectory> >, std::default_delete<std::vector<Trajectory, std::allocator<Trajectory> > > >&, std::unique_ptr<std::vector<int, std::allocator<int> >, std::default_delete<std::vector<int, std::allocator<int> > > >&, std::vector<AlgoProductTraits<reco::Track>::AlgoProduct, std::allocator<AlgoProductTraits<reco::Track>::AlgoProduct> >&, TransientTrackingRecHitBuilder const*, TrackerTopology const*, int) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libRecoTrackerTrackProducer.so
#15 0x00002b856a7540f4 in TrackProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginRecoTrackerTrackProducerPlugins.so
#16 0x00002b84fb5fa95d in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libFWCoreFramework.so
#17 0x00002b84fb5e1072 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libFWCoreFramework.so

Thread 7 (Thread 0x2b854ac02700 (LWP 654) "cmsRun"):
#2  0x00002b8503eefed0 in sig_pause_for_stacktrace () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002b852f07ac53 in CellularAutomaton::createAndConnectCells(std::vector<HitDoublets const*, std::allocator<HitDoublets const*> > const&, TrackingRegion const&, CACut const&, CACut const&, float) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libRecoPixelVertexingPixelTriplets.so
#5  0x00002b852f0746a8 in CAHitQuadrupletGenerator::hitNtuplets(IntermediateHitDoublets const&, std::vector<OrderedHitSeeds, std::allocator<OrderedHitSeeds> >&, SeedingLayerSetsHits const&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libRecoPixelVertexingPixelTriplets.so
#6  0x00002b858c452c8f in CAHitNtupletEDProducerT<CAHitQuadrupletGenerator>::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginRecoPixelVertexingPixelTripletsPlugins.so
#7  0x00002b84fb5fa95d in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libFWCoreFramework.so
#8  0x00002b84fb5e1072 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libFWCoreFramework.so

Thread 6 (Thread 0x2b854a201700 (LWP 653) "cmsRun"):
#2  0x00002b8503eefed0 in sig_pause_for_stacktrace () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002b85a355538d in Basic2DGenericPFlowClusterizer::growPFClusters(reco::PFCluster const&, std::vector<bool, std::allocator<bool> > const&, unsigned int, unsigned int, double, std::vector<reco::PFCluster, std::allocator<reco::PFCluster> >&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginRecoParticleFlowPFClusterProducerPlugins.so
#5  0x00002b85a3554df7 in Basic2DGenericPFlowClusterizer::growPFClusters(reco::PFCluster const&, std::vector<bool, std::allocator<bool> > const&, unsigned int, unsigned int, double, std::vector<reco::PFCluster, std::allocator<reco::PFCluster> >&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginRecoParticleFlowPFClusterProducerPlugins.so
#6  0x00002b85a3554df7 in Basic2DGenericPFlowClusterizer::growPFClusters(reco::PFCluster const&, std::vector<bool, std::allocator<bool> > const&, unsigned int, unsigned int, double, std::vector<reco::PFCluster, std::allocator<reco::PFCluster> >&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginRecoParticleFlowPFClusterProducerPlugins.so
#7  0x00002b85a3554df7 in Basic2DGenericPFlowClusterizer::growPFClusters(reco::PFCluster const&, std::vector<bool, std::allocator<bool> > const&, unsigned int, unsigned int, double, std::vector<reco::PFCluster, std::allocator<reco::PFCluster> >&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginRecoParticleFlowPFClusterProducerPlugins.so
#8  0x00002b85a3554df7 in Basic2DGenericPFlowClusterizer::growPFClusters(reco::PFCluster const&, std::vector<bool, std::allocator<bool> > const&, unsigned int, unsigned int, double, std::vector<reco::PFCluster, std::allocator<reco::PFCluster> >&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginRecoParticleFlowPFClusterProducerPlugins.so
#9  0x00002b85a3554df7 in Basic2DGenericPFlowClusterizer::growPFClusters(reco::PFCluster const&, std::vector<bool, std::allocator<bool> > const&, unsigned int, unsigned int, double, std::vector<reco::PFCluster, std::allocator<reco::PFCluster> >&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginRecoParticleFlowPFClusterProducerPlugins.so
#10 0x00002b85a3554df7 in Basic2DGenericPFlowClusterizer::growPFClusters(reco::PFCluster const&, std::vector<bool, std::allocator<bool> > const&, unsigned int, unsigned int, double, std::vector<reco::PFCluster, std::allocator<reco::PFCluster> >&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginRecoParticleFlowPFClusterProducerPlugins.so
#11 0x00002b85a3554df7 in Basic2DGenericPFlowClusterizer::growPFClusters(reco::PFCluster const&, std::vector<bool, std::allocator<bool> > const&, unsigned int, unsigned int, double, std::vector<reco::PFCluster, std::allocator<reco::PFCluster> >&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginRecoParticleFlowPFClusterProducerPlugins.so
#12 0x00002b85a3555c1e in Basic2DGenericPFlowClusterizer::buildClusters(std::vector<reco::PFCluster, std::allocator<reco::PFCluster> > const&, std::vector<bool, std::allocator<bool> > const&, std::vector<reco::PFCluster, std::allocator<reco::PFCluster> >&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginRecoParticleFlowPFClusterProducerPlugins.so
#13 0x00002b85a3579689 in PFClusterProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginRecoParticleFlowPFClusterProducerPlugins.so
#14 0x00002b84fb5fa95d in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libFWCoreFramework.so
#15 0x00002b84fb5e1072 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libFWCoreFramework.so

Thread 5 (Thread 0x2b8549600700 (LWP 652) "cmsRun"):
#2  0x00002b8503eefed0 in sig_pause_for_stacktrace () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002b850427f406 in TH2F::AddBinContent(int, double) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/external/el8_amd64_gcc11/lib/libHist.so
#5  0x00002b85042768c0 in TH2::Fill(double, double, double) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/external/el8_amd64_gcc11/lib/libHist.so
#6  0x00002b8504765295 in dqm::impl::MonitorElement::Fill(double, double) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libDQMServicesCore.so
#7  0x00002b856a2822eb in RecHitTask::_process(edm::Event const&, edm::EventSetup const&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginDQMHcalTasksAuto.so
#8  0x00002b856a218f60 in non-virtual thunk to DQMOneEDAnalyzer<edm::LuminosityBlockCache<hcaldqm::Cache> >::accumulate(edm::Event const&, edm::EventSetup const&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginDQMHcalTasksAuto.so
#9  0x00002b84fb5f165e in edm::one::EDProducerBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libFWCoreFramework.so
#10 0x00002b84fb5d94f2 in edm::WorkerT<edm::one::EDProducerBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libFWCoreFramework.so

Thread 4 (Thread 0x2b8548413700 (LWP 651) "cmsRun"):
#2  0x00002b8503eefed0 in sig_pause_for_stacktrace () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002b852af78fc6 in HcalGeometry::getGeometryRawPtr(unsigned int) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libGeometryHcalTowerAlgo.so
#5  0x00002b852b082c28 in CaloSubdetectorGeometry::cellGeomPtr(unsigned int) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libGeometryCaloGeometry.so
#6  0x00002b852af7a3fe in HcalGeometry::getGeometry(DetId const&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libGeometryHcalTowerAlgo.so
#7  0x00002b852f57d073 in CaloDetIdAssociator::getDetIdPoints(DetId const&, std::vector<Point3DBase<float, GlobalTag>, std::allocator<Point3DBase<float, GlobalTag> > >&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginTrackingToolsTrackAssociatorPlugins.so
#8  0x00002b852f580683 in CaloDetIdAssociator::crossedElement(Point3DBase<float, GlobalTag> const&, Point3DBase<float, GlobalTag> const&, DetId const&, double, SteppingHelixStateInfo const*) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginTrackingToolsTrackAssociatorPlugins.so
#9  0x00002b852f5b2e93 in DetIdAssociator::getCrossedDetIds(std::set<DetId, std::less<DetId>, std::allocator<DetId> > const&, std::vector<Point3DBase<float, GlobalTag>, std::allocator<Point3DBase<float, GlobalTag> > > const&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libTrackingToolsTrackAssociator.so
#10 0x00002b852f5c6eda in TrackDetectorAssociator::fillHcal(edm::Event const&, TrackDetMatchInfo&, TrackAssociatorParameters const&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libTrackingToolsTrackAssociator.so
#11 0x00002b852f5c9cb2 in TrackDetectorAssociator::associate(edm::Event const&, edm::EventSetup const&, TrackAssociatorParameters const&, FreeTrajectoryState const*, FreeTrajectoryState const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libTrackingToolsTrackAssociator.so
#12 0x00002b852f5ca379 in TrackDetectorAssociator::associate(edm::Event const&, edm::EventSetup const&, reco::Track const&, TrackAssociatorParameters const&, TrackDetectorAssociator::Direction) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libTrackingToolsTrackAssociator.so
#13 0x00002b858a3a22be in MuonIdProducer::fillMuonId(edm::Event&, edm::EventSetup const&, reco::Muon&, TrackDetectorAssociator::Direction) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginRecoMuonMuonIdentificationPlugins.so
#14 0x00002b858a3a4c89 in MuonIdProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginRecoMuonMuonIdentificationPlugins.so
#15 0x00002b84fb5fa95d in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libFWCoreFramework.so
#16 0x00002b84fb5e1072 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libFWCoreFramework.so

Thread 3 (Thread 0x2b8547a12700 (LWP 650) "cmsRun"):
#2  0x00002b8503eefed0 in sig_pause_for_stacktrace () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002b852f0854b7 in ThirdHitPredictionFromCircle::phi(float, float) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libRecoPixelVertexingPixelTriplets.so
#5  0x00002b858b65973c in MultiHitGeneratorFromChi2::hitSets(TrackingRegion const&, OrderedMultiHits&, HitDoublets const&, RecHitsSortedInPhi const**, std::vector<DetLayer const*, std::allocator<DetLayer const*> > const&, int, std::vect
or<std::unique_ptr<BaseTrackerRecHit, std::default_delete<BaseTrackerRecHit> >, std::allocator<std::unique_ptr<BaseTrackerRecHit, std::default_delete<BaseTrackerRecHit> > > >&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_
13_0_3/lib/el8_amd64_gcc11/pluginRecoTrackerTkSeedGeneratorPlugins.so
#6  0x00002b858b65018a in MultiHitGeneratorFromChi2::hitSets(TrackingRegion const&, OrderedMultiHits&, HitDoublets const&, std::vector<SeedingLayerSetsHits::SeedingLayer, std::allocator<SeedingLayerSetsHits::SeedingLayer> > const&, LayerHitMapCache&, std::vector<std::unique_ptr<BaseTrackerRecHit, std::default_delete<BaseTrackerRecHit> >, std::allocator<std::unique_ptr<BaseTrackerRecHit, std::default_delete<BaseTrackerRecHit> > > >&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginRecoTrackerTkSeedGeneratorPlugins.so
#7  0x00002b858b651003 in MultiHitFromChi2EDProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginRecoTrackerTkSeedGeneratorPlugins.so
#8  0x00002b84fb5fa95d in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libFWCoreFramework.so
#9  0x00002b84fb5e1072 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libFWCoreFramework.so
#10 0x00002b84fb56d6da in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libFWCoreFramework.so
#11 0x00002b84fb56db88 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libFWCoreFramework.so

Thread 1 (Thread 0x2b84fea871c0 (LWP 532) "cmsRun"):
#2  0x00002b8503eefed0 in sig_pause_for_stacktrace () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002b85284248d2 in HelixArbitraryPlaneCrossing::positionInDouble(double) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libTrackingToolsGeomPropagators.so
#5  0x00002b8528425169 in HelixArbitraryPlaneCrossing::pathLength(Plane const&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libTrackingToolsGeomPropagators.so
#6  0x00002b852f1b8872 in RKPropagatorInS::propagateWithPath(FreeTrajectoryState const&, Plane const&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libTrackPropagationRungeKutta.so
#7  0x00002b852f1b2c65 in Propagator::propagateWithPath(TrajectoryStateOnSurface const&, Plane const&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libTrackPropagationRungeKutta.so
#8  0x00002b852f1a5a72 in PropagatorWithMaterial::propagateWithPath(TrajectoryStateOnSurface const&, Plane const&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libTrackingToolsMaterialEffects.so
#9  0x00002b852842397c in Propagator::propagateWithPath(TrajectoryStateOnSurface const&, Surface const&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libTrackingToolsGeomPropagators.so
#10 0x00002b85404ee8b7 in KFTrajectorySmoother::trajectory(Trajectory const&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libTrackingToolsTrackFitters.so
#11 0x00002b85404a046f in (anonymous namespace)::KFFittingSmoother::smoothingStep(Trajectory&&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginTrackingToolsTrackFittersPlugins.so
#12 0x00002b85404a3c4e in (anonymous namespace)::KFFittingSmoother::fitOne(TrajectorySeed const&, std::vector<std::shared_ptr<TrackingRecHit const>, std::allocator<std::shared_ptr<TrackingRecHit const> > > const&, TrajectoryStateOnSurface const&, TrajectoryFitter::fitType) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginTrackingToolsTrackFittersPlugins.so
#13 0x00002b854049df97 in (anonymous namespace)::FlexibleKFFittingSmoother::fitOne(TrajectorySeed const&, std::vector<std::shared_ptr<TrackingRecHit const>, std::allocator<std::shared_ptr<TrackingRecHit const> > > const&, TrajectoryStateOnSurface const&, TrajectoryFitter::fitType) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginTrackingToolsTrackFittersPlugins.so
#14 0x00002b856a835cd2 in TrackProducerAlgorithm<reco::Track>::buildTrack(TrajectoryFitter const*, Propagator const*, std::vector<AlgoProductTraits<reco::Track>::AlgoProduct, std::allocator<AlgoProductTraits<reco::Track>::AlgoProduct> >&, std::vector<std::shared_ptr<TrackingRecHit const>, std::allocator<std::shared_ptr<TrackingRecHit const> > >&, TrajectoryStateOnSurface&, TrajectorySeed const&, float, reco::BeamSpot const&, edm::RefToBase<TrajectorySeed>, int, signed char) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libRecoTrackerTrackProducer.so
#15 0x00002b856a7559dd in TrackProducerAlgorithm<reco::Track>::runWithCandidate(TrackingGeometry const*, MagneticField const*, std::vector<TrackCandidate, std::allocator<TrackCandidate> > const&, TrajectoryFitter const*, Propagator const*, TransientTrackingRecHitBuilder const*, reco::BeamSpot const&, std::vector<AlgoProductTraits<reco::Track>::AlgoProduct, std::allocator<AlgoProductTraits<reco::Track>::AlgoProduct> >&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginRecoTrackerTrackProducerPlugins.so
#16 0x00002b856a7542c2 in TrackProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginRecoTrackerTrackProducerPlugins.so
#17 0x00002b84fb5fa95d in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libFWCoreFramework.so
#18 0x00002b84fb5e1072 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libFWCoreFramework.so

Current Modules:
Module: FastjetJetProducer:ak4PFJets (crashed)
Module: MultiHitFromChi2EDProducer:pixelLessStepHitTriplets
Module: PFClusterProducer:particleFlowClusterHBHE
Module: RecHitTask:recHitTask
Module: TrackProducer:mixedTripletStepTracks
Module: MuonIdProducer:muons1stStep
Module: TrackProducer:initialStepTracks
Module: CAHitQuadrupletEDProducer:detachedQuadStepHitQuadruplets

makortel · 2023-04-24T21:32:00Z

assign reconstruction

cmsbuild · 2023-04-24T21:32:18Z

New categories assigned: reconstruction

@mandrenguyen,@clacaputo you have been requested to review this Pull request/Issue and eventually sign? Thanks

mmusich · 2023-04-24T21:33:43Z

There is one job failing Reco for Run 366351

that's not a global run. I think the original message on the Tier-0 cmstalk is about 366451

malbouis · 2023-04-24T21:35:47Z

There is one job failing Reco for Run 366351

that's not a global run. I think the original message on the Tier-0 cmstalk is about 366451

Thanks Marco! I have updated the description.

malbouis · 2023-04-25T15:21:46Z

Let me add a recipe to reproduce the error, as discussed at the OPR meeting today.

cmsrel CMSSW_13_0_3
cd CMSSW_13_0_3/src/
cmsenv
cp -r /afs/cern.ch/user/c/cmst0/public/PausedJobs/Run2023B/job_248341/job/WMTaskSpace/ .
cd WMTaskSpace/cmsRun1/
cmsRun -e PSet.py

mandrenguyen · 2023-04-25T19:27:04Z

I don't reproduce this error using the Pkl.
Unfortunately lxplus was killing my interactive job, so I copied the input file to my T2.
On the LLR machine, the job runs to completion in 13_0_3.
The memory report gives the following:
MemoryReport> Peak virtual size 18036.2 Mbytes
MemoryReport> Peak rss size 9978.31 Mbytes

malbouis · 2023-04-26T08:30:56Z

Thanks, @mandrenguyen !

I could reproduce it in lxplus when I tried it. Maybe could someone else double check that the crash can be reproduced at lxplus with the recipe that was posted above?

germanfgv · 2023-04-26T08:46:59Z

@mandrenguyen just to confirm, were you using scram arch el8_amd64_gcc11?

mmusich · 2023-04-26T08:56:42Z

were you using scram arch el8_amd64_gcc11?

i also tried last night, and if you use the regular arch one gets in lxplus (not lxplus8): slc7_amd64_gcc11 the crash is not there.

malbouis · 2023-04-26T09:23:53Z

were you using scram arch el8_amd64_gcc11?

i also tried last night, and if you use the regular arch one gets in lxplus (not lxplus8): slc7_amd64_gcc11 the crash is not there.

Thanks Marco!
I tried it on lxplus8 and I reproduced the crash, but I had not tried it on regular lxplus.

mmusich · 2023-04-26T10:57:30Z

I tried it on lxplus8 and I reproduced the crash, but I had not tried it on regular lxplus.

for the record, on an lxplus8 node, using the recipe above, and a slightly modified PSet:

import FWCore.ParameterSet.Config as cms
import pickle
with open('PSet.pkl', 'rb') as handle:
    process = pickle.load(handle)
    process.options.numberOfThreads = 1
    process.source.skipEvents=cms.untracked.uint32(586)

it will segfault consistently at the first event processed.

mandrenguyen · 2023-04-26T14:31:28Z

The offending line is:

cmssw/RecoJets/JetProducers/plugins/FastjetJetProducer.cc

Line 359 in 0d81be4

    
           ClusterSequencePtr(new fastjet::ClusterSequenceArea(fjInputs_, *fjJetDefinition_, *fjAreaDefinition_));

The problem appears to come from fjAreaDefinition_
That's as far as I understood for the moment. If @cms-sw/jetmet-pog-l2 or @laurenhay have any ideas feel free to chime in.

Dr15Jones · 2023-04-26T14:38:43Z

Looking at where fjAreaDefinition_ is checked in the constructor, it seems that useConstituentSubtraction_ is also supposed to be true if fjAreaDefinition_ is used. The code causing the problem does not first check that useConstituentSubtraction_ == true.

Dr15Jones · 2023-04-26T14:43:17Z

The value of fjAreaDefinition_ is set here and is only set if certain criteria are met

cmssw/RecoJets/JetProducers/plugins/VirtualJetProducer.cc

Lines 238 to 250 in a346606

    
           if (doAreaFastjet_ || doRhoFastjet_) { 
        
             if (voronoiRfact_ <= 0) { 
        
               fjActiveArea_ = std::make_shared<fastjet::GhostedAreaSpec>(ghostEtaMax_, activeAreaRepeats_, ghostArea_); 
        
               if (!useExplicitGhosts_) { 
        
                 fjAreaDefinition_ = std::make_shared<fastjet::AreaDefinition>(fastjet::active_area, *fjActiveArea_); 
        
               } else { 
        
                 fjAreaDefinition_ = 
        
                     std::make_shared<fastjet::AreaDefinition>(fastjet::active_area_explicit_ghosts, *fjActiveArea_); 
        
               } 
        
             } 
        
             fjSelector_ = std::make_shared<fastjet::Selector>(fastjet::SelectorAbsRapMax(rhoEtaMax_)); 
        
           }

mandrenguyen · 2023-04-26T16:19:35Z

Since it's ak4PFJets that's crashing, I believe useConstituentSubtraction_ should not be set to true, but fjAreaDefinition_ does indeed need to be defined. Based on the snippet above though, I think the conditions are met. doAreaFastjet_ is true and voronoiRfact_ is indeed set to a negative value.

mandrenguyen · 2023-04-26T17:27:31Z

I looped over the jet on which the code is crashing.

    for (auto const& input : fjInputs_) {
      if(!(input.E() > 0)) std::cout<< "e "<<input.e()<<" phi "<<input.phi()<<" rap "<<input.rap()<<" px "<<input.px()<<" py "<<input.py()<<" pz "<<input.pz()<<std::endl;
    }
    fjClusterSeq_ = ClusterSequencePtr(new fastjet::ClusterSequenceArea(fjInputs_, *fjJetDefinition_, *fjAreaDefinition_));

Out of the 3080 jet constituents, one of them has NaN for e() and rap().
I guess that's what's causing fastJet to choke.
Perhaps we should see if that's coming from the input PFCandidate collection.

For what it's worth px,py,pz are set correctly:
e -nan phi 0.343687 rap -nan px 2.27882 py 0.815566 pz -1.90158

mandrenguyen · 2023-04-26T21:10:01Z

Some more observations.
I can find the anomalous PFCandidate in PFLinker.cc
It's of type =1, so it's a charged hadron.

cand.trackRef() is non-null, and has the following values, which I'm not immediately finding in generalTracks (but I didn't check super carefully):
px = 6.52489e+08 py 2.33519e+08 pz -5.44475e+08
The linked calo energies are all -nan
hoEnergy()
hcalEnergy()
ecalEnergy()

I guess my next step would be to see if I can track the nan back to where charged hadrons are first created PFAlgo.cc, but I won't be able to get to it immediately.
If anyone else wants to have a look, feel free of course.

makortel · 2023-04-26T21:22:32Z

Here is an issue from 2022 of a PFCandidate with NaN #39110 (I did not attempt to understand if it would be related though)

Let's anyway tag @cms-sw/pf-l2

mandrenguyen · 2023-04-27T03:12:14Z

In case it's useful to examine the output, one can get the job to finish successfully by inserting the following in the loop over PF candidates in PFLinker.cc

`  if(!(cand.energy()>0) ) continue;`

malbouis · 2023-04-27T08:44:01Z

We have 3 more occurrences of this error in pp runs, for dataset EphemeralZeroBias:

2 paused jobs in run 366495
1 paused job in run 366497

I post here the links for the tar files, in case someone would like to try to reproduce them (I did not yet have the chance)

https://eoscmsweb.cern.ch/eos/cms/store/logs/prod/recent/PromptReco/PromptReco_Run366495_EphemeralZeroBias17/Reco
https://eoscmsweb.cern.ch/eos/cms/store/logs/prod/recent/PromptReco/PromptReco_Run366495_EphemeralZeroBias13/Reco

https://eoscmsweb.cern.ch/eos/cms/store/logs/prod/recent/PromptReco/PromptReco_Run366497_EphemeralZeroBias18/Reco

kdlong · 2023-04-27T09:27:35Z

Thanks for all the info, will take a look ASAP

mandrenguyen · 2023-04-27T11:08:59Z

Thanks @kdlong
The furthest I've been able to track the nan so far is to:

cmssw/RecoParticleFlow/PFProducer/src/PFAlgo.cc

Line 2746 in 9fa6185

chargedHadronsTotalEnergy += chargedHadron.energy();

chargedHadron.energy() is returning -nan for index = 1411

mmusich · 2023-04-27T12:20:07Z

type pf

malbouis · 2023-04-30T09:07:38Z

We have yet another paused job in Tier0 due to this crash.

It is occurring for run 366729 in dataset EphemeralZeroBias10.

The tar ball can be found in https://eoscmsweb.cern.ch/eos/cms/store/logs/prod/recent/PromptReco/PromptReco_Run366729_EphemeralZeroBias10/Reco

Is there any further progress in debugging this issue?

swagata87 · 2023-04-30T18:49:30Z

Here is an issue from 2022 of a PFCandidate with NaN #39110 (I did not attempt to understand if it would be related though)

yes there was a similar finding last year which was causing photon's isolation being NaN, when the bad pf candidate ended up in photon's isolation cone. A preliminary fix was to loop over pf candidate collection, check for NaN and remove those, and make a pfCandNoNaN collection, which was then passed on to calculate isolation. This is where it was done: https://github.com/cms-sw/cmssw/pull/39120/files

maybe something similar can be done for jet/met if this is easier and quicker to do. But of course the real issue need to be solved upstream.

Even if it's fixed at PF level, such extra protections in POG code are probably not a bad idea as PF code (and logic) is complex and can go wrong in various unforeseen ways, specially in startup phase where alignment/calibrations are not perfect, and several special checks/tests are ongoing using special modes (the interplay of those with PF logic can be hard to predict).

malbouis · 2023-04-30T19:02:32Z

Thanks @swagata87 ! This seems like a good solution in order to get rid of these crashes for now.

We have indeed 4 more occurrences today.

run 366729:
https://eoscmsweb.cern.ch/eos/cms/store/logs/prod/recent/PromptReco/PromptReco_Run366729_EphemeralZeroBias6/Reco
https://eoscmsweb.cern.ch/eos/cms/store/logs/prod/recent/PromptReco/PromptReco_Run366729_EphemeralZeroBias6/Reco

run 366727:
https://eoscmsweb.cern.ch/eos/cms/store/logs/prod/recent/PromptReco/PromptReco_Run366727_EphemeralZeroBias8/Reco
https://eoscmsweb.cern.ch/eos/cms/store/logs/prod/recent/PromptReco/PromptReco_Run366727_EphemeralZeroBias16/Reco

kdlong · 2023-05-02T09:15:53Z

I was trying to reproduce this yesterday, and I couldn't get the failure. Now I can't access /afs/cern.ch/user/c/cmst0/public/PausedJobs/Run2023B/job_248341/job/WMTaskSpace/. Was it removed? Is there a simple recipe someone can point me to?

mandrenguyen · 2023-05-02T09:20:01Z

Hi @kdlong
You can copy over the relevant files from my area:
/afs/cern.ch/work/m/mnguyen/public/test/CMSSW_13_0_3/src

In PSet.py I skip directly to the crashing event, so you should find it immediately.
Note that the crash only occurs on lxplus8, you won't see if on SL7.

kdlong · 2023-05-03T08:14:15Z

Thanks @mandrenguyen. Unfortunately it seems the file has already been removed from disk. Does anyone have other examples of the failure with a file that's still accessible?

mandrenguyen · 2023-05-03T08:41:48Z

@kdlong Taking one of the other examples from
#41397 (comment)
I copied the input root file for safe keeping, as well as the tarball to:
/eos/cms/store/group/phys_heavyions/mnguyen/PFcrash/

mandrenguyen · 2023-05-03T09:35:12Z

@kdlong You can use the following PSet.py to skip directly to the crashing event:

import FWCore.ParameterSet.Config as cms
import pickle
with open('PSet.pkl', 'rb') as handle:
    process = pickle.load(handle)
    process.SimpleMemoryCheck = cms.Service("SimpleMemoryCheck")
    process.maxEvents.input = -1
    process.source.fileNames=cms.untracked.vstring('file:/eos/cms/store/group/phys_heavyions/mnguyen/PFcrash/a9175998-1945-4443-b085-8960314354a9.root')
    process.source.skipEvents=cms.untracked.uint32(5693)
    process.options.numberOfThreads = 1

You can bypass the crash in FastJet by merging this one-liner PR: #41474

kdlong · 2023-05-05T08:25:29Z

Thanks @mandrenguyen. I reproduced the issue finally and understood that it came from the mass-aware scaling that I introduced in #39368. In the case of a track with a huge momentum but huge uncertainty (1e7 in the example given above), the scale factor is very small and the energy rescaling computation has numeric issues. The fix is simple, remove the large ratios by calculating the energy from the rescaled momentum rather than calculating a scaling factor.

makortel · 2023-05-30T17:59:43Z

Just to make sure, is the problem described in this issue fixed now?

laurenhay · 2023-09-18T20:35:12Z

Just to make sure, is the problem described in this issue fixed now?

Yes this issue can be closed.

makortel · 2023-09-18T20:49:51Z

@cmsbuild, please close

mandrenguyen · 2023-09-19T04:52:43Z

+1

cmsbuild · 2023-09-19T04:53:04Z

This issue is fully signed and ready to be closed.

malbouis changed the title ~~Segmentation error in PromptReco~~ Segmentation violation in PromptReco Apr 24, 2023

cmsbuild added the pending-assignment label Apr 24, 2023

cmsbuild added reconstruction-pending pending-signatures and removed pending-assignment labels Apr 24, 2023

cmsbuild added the pf label Apr 27, 2023

malbouis changed the title ~~Segmentation violation in PromptReco~~ Segmentation violation in PromptReco for FastjetJetProducer:ak4PFJets Apr 27, 2023

mandrenguyen mentioned this issue Apr 30, 2023

Remove PF candidates with nan for energy #41473

Merged

perrotta mentioned this issue May 1, 2023

patch release of 13_0_X with #41467 for HLT #41475

Closed

mandrenguyen mentioned this issue May 2, 2023

Additional patch based on 13_0_3 for avoiding both crashes and memory issues in prompt reco #41489

Closed

kdlong mentioned this issue May 5, 2023

Fix numeric issues in PFCand scaling, add some debug output #41550

Merged

This was referenced May 5, 2023

Fix pf candidate scaling #41551

Closed

13_1 backport: Fix numeric issues in PFCand scaling, add some debug output #41608

Merged

cmsbuild closed this as completed Sep 18, 2023

cmsbuild added reconstruction-approved fully-signed and removed reconstruction-pending pending-signatures labels Sep 19, 2023

Segmentation violation in PromptReco for FastjetJetProducer:ak4PFJets #41397

Segmentation violation in PromptReco for FastjetJetProducer:ak4PFJets #41397

Comments

malbouis commented Apr 24, 2023 • edited Loading

cmsbuild commented Apr 24, 2023

malbouis commented Apr 24, 2023

makortel commented Apr 24, 2023

makortel commented Apr 24, 2023

cmsbuild commented Apr 24, 2023

mmusich commented Apr 24, 2023

malbouis commented Apr 24, 2023

malbouis commented Apr 25, 2023 • edited Loading

mandrenguyen commented Apr 25, 2023

malbouis commented Apr 26, 2023

germanfgv commented Apr 26, 2023

mmusich commented Apr 26, 2023

malbouis commented Apr 26, 2023

mmusich commented Apr 26, 2023

mandrenguyen commented Apr 26, 2023

Dr15Jones commented Apr 26, 2023

Dr15Jones commented Apr 26, 2023

mandrenguyen commented Apr 26, 2023

mandrenguyen commented Apr 26, 2023 • edited Loading

mandrenguyen commented Apr 26, 2023

makortel commented Apr 26, 2023

mandrenguyen commented Apr 27, 2023

malbouis commented Apr 27, 2023 • edited Loading

kdlong commented Apr 27, 2023

mandrenguyen commented Apr 27, 2023

mmusich commented Apr 27, 2023

malbouis commented Apr 30, 2023

swagata87 commented Apr 30, 2023 • edited Loading

malbouis commented Apr 30, 2023

kdlong commented May 2, 2023

mandrenguyen commented May 2, 2023

kdlong commented May 3, 2023

mandrenguyen commented May 3, 2023

mandrenguyen commented May 3, 2023

kdlong commented May 5, 2023

makortel commented May 30, 2023

laurenhay commented Sep 18, 2023

makortel commented Sep 18, 2023

mandrenguyen commented Sep 19, 2023

cmsbuild commented Sep 19, 2023

malbouis commented Apr 24, 2023 •

edited

Loading

malbouis commented Apr 25, 2023 •

edited

Loading

mandrenguyen commented Apr 26, 2023 •

edited

Loading

malbouis commented Apr 27, 2023 •

edited

Loading

swagata87 commented Apr 30, 2023 •

edited

Loading