Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation violation on Prompt Reco in pp collision run for lowPtGsfEleGsfTracks #41442

Closed
malbouis opened this issue Apr 27, 2023 · 18 comments

Comments

@malbouis
Copy link
Contributor

There is a paused job in the Tier0 prompt reco processing due to a segmentation violation on lowPtGsfEleGsfTracks.

It is failing on the processing of a pp collision run 366497, on dataset EphemeralZeroBias.

The tar file can be found at https://eoscmsweb.cern.ch/eos/cms/store/logs/prod/recent/PromptReco/PromptReco_Run366497_EphemeralZeroBias14/Reco

The crash is below:

A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

Wed Apr 26 08:17:37 CEST 2023
Thread 17 (Thread 0x2ad01be00700 (LWP 659) "cmsRun"):

...

Current Modules:

Module: GsfTrackProducer:lowPtGsfEleGsfTracks (crashed)
Module: TrackProducer:initialStepTracksPreSplitting
Module: CkfTrackCandidateMaker:lowPtTripletStepTrackCandidates
Module: CkfTrackCandidateMaker:tobTecStepTrackCandidates
Module: DuplicateTrackMerger:duplicateTrackCandidates
Module: TrackCollectionMerger:earlyGeneralTracks
Module: CkfTrackCandidateMaker:convTrackCandidates
Module: MuonIdProducer:muons1stStep

A fatal system signal has occurred: segmentation violation
@cmsbuild
Copy link
Contributor

A new Issue was created by @malbouis .

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@malbouis
Copy link
Contributor Author

assign reconstruction

1 similar comment
@mmusich
Copy link
Contributor

mmusich commented Apr 27, 2023

assign reconstruction

@cmsbuild
Copy link
Contributor

New categories assigned: reconstruction

@mandrenguyen,@clacaputo you have been requested to review this Pull request/Issue and eventually sign? Thanks

@mmusich
Copy link
Contributor

mmusich commented Apr 27, 2023

type egamma

@mmusich
Copy link
Contributor

mmusich commented Apr 27, 2023

Compiling with debug symbols I get the following stack trace:

#3  0x00007fe36c85633b in sig_dostack_then_abort () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00007fe32fab1912 in TrajectoryStateOnSurface::singleState (this=<optimized out>) at /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/src/TrackingTools/TrajectoryState/interface/TrajectoryStateOnSurface.h:86
#6  GetComponents::GetComponents (tsos=..., this=0x7ffc34f0c490) at /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/src/TrackingTools/GsfTools/interface/GetComponents.h:6
#7  GsfMultiStateUpdator::update (this=0x7fe2a60e2b10, tsos=..., aRecHit=...) at /tmp/musich/CMSSW_13_0_3/src/TrackingTools/GsfTracking/src/GsfMultiStateUpdator.cc:15
#8  0x00007fe32fac229d in GsfTrajectorySmoother::trajectory (this=0x7fe2a621bb80, aTraj=...) at /cvmfs/cms.cern.ch/el8_amd64_gcc11/external/gcc/11.2.1-f9b9dfdd886f71cd63f5538223d8f161/include/c++/11.2.1/bits/stl_iterator.h:1010
#9  0x00007fe33021f46f in (anonymous namespace)::KFFittingSmoother::smoothingStep(Trajectory&&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginTrackingToolsTrackFittersPlugins.so
#10 0x00007fe330222459 in (anonymous namespace)::KFFittingSmoother::fitOne(TrajectorySeed const&, std::vector<std::shared_ptr<TrackingRecHit const>, std::allocator<std::shared_ptr<TrackingRecHit const> > > const&, TrajectoryStateOnSurface const&, TrajectoryFitter::fitType) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginTrackingToolsTrackFittersPlugins.so
#11 0x00007fe314d26859 in TrackProducerAlgorithm<reco::GsfTrack>::buildTrack(TrajectoryFitter const*, Propagator const*, std::vector<AlgoProductTraits<reco::GsfTrack>::AlgoProduct, std::allocator<AlgoProductTraits<reco::GsfTrack>::AlgoProduct> >&, std::vector<std::shared_ptr<TrackingRecHit const>, std::allocator<std::shared_ptr<TrackingRecHit const> > >&, TrajectoryStateOnSurface&, TrajectorySeed const&, float, reco::BeamSpot const&, edm::RefToBase<TrajectorySeed>, int, signed char) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/libRecoTrackerTrackProducer.so
#12 0x00007fe30fa13819 in GsfTrackProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_3/lib/el8_amd64_gcc11/pluginRecoTrackerTrackProducerPlugins.so

so it seems it's originated here:

https://cmssdt.cern.ch/dxr/CMSSW/source/TrackingTools/GsfTools/interface/GetComponents.h#6

@mmusich
Copy link
Contributor

mmusich commented Apr 27, 2023

also for the record:

diff --git a/TrackingTools/GsfTracking/src/GsfMultiStateUpdator.cc b/TrackingTools/GsfTracking/src/GsfMultiStateUpdator.cc
index f3d6d173c10..2429ac9a6f0 100644
--- a/TrackingTools/GsfTracking/src/GsfMultiStateUpdator.cc
+++ b/TrackingTools/GsfTracking/src/GsfMultiStateUpdator.cc
@@ -12,6 +12,10 @@
 
 TrajectoryStateOnSurface GsfMultiStateUpdator::update(const TrajectoryStateOnSurface& tsos,
                                                       const TrackingRecHit& aRecHit) const {
+  if (!tsos.isValid()) {
+    edm::LogError("GsfMultiStateUpdator") << "Trying to update trajectory state with invalid TSOS! ";
+    return TrajectoryStateOnSurface();
+  }
   GetComponents comps(tsos);
   auto const& predictedComponents = comps();
   if (predictedComponents.empty()) {

this solves (up to reco to decide if that's acceptable).

@swagata87 FYI.

@slava77
Copy link
Contributor

slava77 commented Apr 27, 2023

I thought that all invalid tsos cases were already fixed or gracefully skipped in GSF. Apparently not.

@VinInn
Copy link
Contributor

VinInn commented Apr 27, 2023

TSOS is a good candidate to move to std::optional.

@swagata87
Copy link
Contributor

Thanks for the fix, Marco!

I have checked on 1000 MC events from /RelValZpToEE_m6000_14TeV/CMSSW_13_0_0-130X_mcRun3_2022_realistic_v2-v3/GEN-SIM-DIGI-RAW, comparing CMSSW_13_0_3 vs CMSSW_13_0_3+your fix. The config that I ran are obtained as:

cmsDriver.py --filein /store/relval/CMSSW_13_0_0/RelValZpToEE_m6000_14TeV/GEN-SIM-DIGI-RAW/130X_mcRun3_2022_realistic_v2-v3/00000/032f4d6c-690d-4f33-bfaf-f84e93669b33.root --fileout file:AOD_mc_new.root --mc --geometry DB:Extended --era Run3 --eventcontent AODSIM --runUnscheduled --customise Configuration/DataProcessing/Utils.addMonitoring --datatier AODSIM --conditions 130X_mcRun3_2022_realistic_v2 --step RAW2DIGI,RECO --python_filename aod_cfg.py --no_exec -n -1

I do not see any change in distributions of electron pT, eta, dEta(supercluster_seed,inner_track) etc. So the fix seems reasonable to me.

@mandrenguyen
Copy link
Contributor

Thanks @swagata87 and @mmusich
Will one of you please go ahead and make a PR to master and a backport to 13_0_X ?

@swagata87
Copy link
Contributor

TSOS is a good candidate to move to std::optional.

Thank you, Vincenzo.

If it's okay, we can go with Marco's fix for now, so that this immediate issue is quickly wrapped up.
But later on, we will try to move to std::optional.

@mmusich
Copy link
Contributor

mmusich commented Apr 28, 2023

here are PRs:

@dan131riley
Copy link

There's a possibly related crash that shows up in the arm64 relvals,

#5  0x00004000630694c8 in GsfTrajectoryFitter::fitOne(TrajectorySeed const&, std::vector<std::shared_ptr<TrackingRecHit const>, std::allocator<std::shared_ptr<TrackingRecHit const> > > const&, TrajectoryStateOnSurface const&, TrajectoryFitter::fitType) const () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02782/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-27-2300/lib/el8_aarch64_gcc11/libTrackingToolsGsfTracking.so
#6  0x0000400062e5d204 in (anonymous namespace)::KFFittingSmoother::fitOne(TrajectorySeed const&, std::vector<std::shared_ptr<TrackingRecHit const>, std::allocator<std::shared_ptr<TrackingRecHit const> > > const&, TrajectoryStateOnSurface const&, TrajectoryFitter::fitType) const () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02782/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-27-2300/lib/el8_aarch64_gcc11/pluginTrackingToolsTrackFittersPlugins.so
#7  0x00004000a4128440 in TrackProducerAlgorithm<reco::GsfTrack>::buildTrack(TrajectoryFitter const*, Propagator const*, std::vector<AlgoProductTraits<reco::GsfTrack>::AlgoProduct, std::allocator<AlgoProductTraits<reco::GsfTrack>::AlgoProduct> >&, std::vector<std::shared_ptr<TrackingRecHit const>, std::allocator<std::shared_ptr<TrackingRecHit const> > >&, TrajectoryStateOnSurface&, TrajectorySeed const&, float, reco::BeamSpot const&, edm::RefToBase<TrajectorySeed>, int, signed char) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02782/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-27-2300/lib/el8_aarch64_gcc11/libRecoTrackerTrackProducer.so
#8  0x00004000a4005778 in GsfTrackProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02782/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-04-27-2300/lib/el8_aarch64_gcc11/pluginRecoTrackerTrackProducerPlugins.so

@malbouis
Copy link
Contributor Author

The fix for this issue has been integrated into the new release, CMSSW_13_0_5 that has just been built. @cms-sw/reconstruction-l2 if it's ok with you, I will close this issue.

@clacaputo
Copy link
Contributor

@cmsbuild
Copy link
Contributor

cmsbuild commented May 2, 2023

This issue is fully signed and ready to be closed.

@makortel
Copy link
Contributor

makortel commented May 2, 2023

@cmsbuild, please close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants