-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation violation for StandAloneMuonProducer in CMSSW_14_0_7 #45035
Comments
cms-bot internal usage |
A new Issue was created by @francescobrivio. @Dr15Jones, @antoniovilela, @rappoccio, @sextonkennedy, @smuzaffar, @makortel can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign reconstruction |
New categories assigned: reconstruction @jfernan2,@mandrenguyen you have been requested to review this Pull request/Issue and eventually sign? Thanks |
FYI @cms-sw/muon-pog-l2 and @cms-sw/muon-dpg-l2 |
type muon |
For the record with this: diff --git a/TrackingTools/DetLayers/src/ForwardDetLayer.cc b/TrackingTools/DetLayers/src/ForwardDetLayer.cc
index d6f0afdf22c..b5d2eaa27ac 100644
--- a/TrackingTools/DetLayers/src/ForwardDetLayer.cc
+++ b/TrackingTools/DetLayers/src/ForwardDetLayer.cc
@@ -88,7 +88,13 @@ pair<bool, TrajectoryStateOnSurface> ForwardDetLayer::compatible(const Trajector
edm::LogError("DetLayers") << "ERROR: BarrelDetLayer::compatible() is used before the layer surface is initialized";
// throw an exception? which one?
- TrajectoryStateOnSurface myState = prop.propagate(ts, specificSurface());
+ TrajectoryStateOnSurface myState;
+ if UNLIKELY (!ts.isValid()) {
+ edm::LogError("DetLayers") << "ERROR: trying to propagate invalid TSOS" << std::endl;
+ } else {
+ myState = prop.propagate(ts, specificSurface());
+ }
+
if UNLIKELY (!myState.isValid())
return make_pair(false, myState);
one gets past the error and in the log running the script at #45035 (comment) one can find:
it seems plausible that then the problem originates here: cmssw/TrackingTools/KalmanUpdators/src/KFUpdator.cc Lines 170 to 173 in 32dc277
where upon being unable to invert a matrix, a default constructed TSOS (with invalid state) is returned. |
Let me tag the Muon Reco contacts too: @24LopezR @rbhattacharya04 |
it's not in the crash stack. |
yes
is: diff --git a/RecoMuon/StandAloneTrackFinder/src/StandAloneMuonFilter.cc b/RecoMuon/StandAloneTrackFinder/src/StandAloneMuonFilter.cc
index 5253626ab46..0150b405859 100644
--- a/RecoMuon/StandAloneTrackFinder/src/StandAloneMuonFilter.cc
+++ b/RecoMuon/StandAloneTrackFinder/src/StandAloneMuonFilter.cc
@@ -299,6 +299,11 @@ std::vector<TrajectoryMeasurement> StandAloneMuonFilter::findBestMeasurements(co
std::vector<TrajectoryMeasurement> result;
std::vector<TrajectoryMeasurement> measurements;
+ if (!tsos.isValid()) {
+ edm::LogError(metname) << "ERROR: trying to propagate invalid TSOS" << std::endl;
+ return result;
+ }
+
if (theOverlappingChambersFlag && layer->hasGroups()) {
std::vector<TrajectoryMeasurementGroup> measurementGroups =
theMeasurementExtractor->groupedMeasurements(layer, tsos, *propagator(), *estimator()); upstream enough ? |
|
on a second thought, I'm not really against the The hope from going upstream would be to see if there it could become a case more suitable for a |
type tracking |
for this case I tested (successfully) also this variant: diff --git a/RecoMuon/StandAloneTrackFinder/src/StandAloneMuonFilter.cc b/RecoMuon/StandAloneTrackFinder/src/StandAloneMuonFilter.cc
index 5253626ab46..08deb0200a1 100644
--- a/RecoMuon/StandAloneTrackFinder/src/StandAloneMuonFilter.cc
+++ b/RecoMuon/StandAloneTrackFinder/src/StandAloneMuonFilter.cc
@@ -233,7 +233,13 @@ void StandAloneMuonFilter::refit(const TrajectoryStateOnSurface& initialTSOS,
LogTrace(metname) << "search Trajectory Measurement from: " << lastTSOS.globalPosition();
// pick the best measurement from each group
- std::vector<TrajectoryMeasurement> bestMeasurements = findBestMeasurements(*layer, lastTSOS);
+ std::vector<TrajectoryMeasurement> bestMeasurements{};
+
+ if (lastTSOS.isValid()) {
+ bestMeasurements = findBestMeasurements(*layer, lastTSOS);
+ } else {
+ edm::LogInfo(metname) << "Invalid last TSOS, will not find best measurements ";
+ }
// RB: Different ways can be choosen if no bestMeasurement is available:
// 1- check on lastTSOS-initialTSOS eta difference
@@ -248,10 +254,16 @@ void StandAloneMuonFilter::refit(const TrajectoryStateOnSurface& initialTSOS,
// wrt the initial one (i.e. seed), then try to find the measurements
// according to the lastButOne FTS. (1B)
double lastdEta = fabs(lastTSOS.freeTrajectoryState()->momentum().eta() - eta0);
+
if (bestMeasurements.empty() && lastdEta > 0.1) {
LogTrace(metname) << "No measurement and big eta variation wrt seed" << endl
<< "trying with lastButOneUpdatedTSOS";
- bestMeasurements = findBestMeasurements(*layer, theLastButOneUpdatedTSOS);
+
+ if (theLastButOneUpdatedTSOS.isValid()) {
+ bestMeasurements = findBestMeasurements(*layer, theLastButOneUpdatedTSOS);
+ } else {
+ edm::LogInfo(metname) << "Invalid last but one updated TSOS, will not find best measurements ";
+ }
}
//if no measurement found and the current FTS has an eta very different
@@ -259,7 +271,12 @@ void StandAloneMuonFilter::refit(const TrajectoryStateOnSurface& initialTSOS,
//according to the initial FTS. (1A)
if (bestMeasurements.empty() && lastdEta > 0.1) {
LogTrace(metname) << "No measurement and big eta variation wrt seed" << endl << "tryng with seed TSOS";
- bestMeasurements = findBestMeasurements(*layer, initialTSOS);
+
+ if (initialTSOS.isValid()) {
+ bestMeasurements = findBestMeasurements(*layer, initialTSOS);
+ } else {
+ edm::LogInfo(metname) << "Invalid initial TSOS, will not find best measurements ";
+ }
}
// FIXME: uncomment this line!! (in the specifics of this crash, it was actually
if either way is fine, I'd let the domain experts to decide / implement a protection. |
We currently have other two failures all reporting
|
@cms-sw/reconstruction-l2 @24LopezR @rbhattacharya04 |
I have tested the provided fix on the crashing jobs, and while it solves the first issue we observe, it doesn't help with the crashes observed in the other 2 jobs:
Recipes to reproduce the second two jobs
Run381067
And Matt's fix can be fetched with:
|
if you have reproduced offline, can you update the recipe with the number of events to skip to get faster at the crash? |
sorry i forgot to add it in the recipe! I've edited the message above. |
the two other crashes happen at a different location:
in this case diff --git a/RecoMuon/StandAloneTrackFinder/src/StandAloneMuonFilter.cc b/RecoMuon/StandAloneTrackFinder/src/StandAloneMuonFilter.cc
index 5253626ab46..30f1003b9eb 100644
--- a/RecoMuon/StandAloneTrackFinder/src/StandAloneMuonFilter.cc
+++ b/RecoMuon/StandAloneTrackFinder/src/StandAloneMuonFilter.cc
@@ -233,7 +233,15 @@ void StandAloneMuonFilter::refit(const TrajectoryStateOnSurface& initialTSOS,
LogTrace(metname) << "search Trajectory Measurement from: " << lastTSOS.globalPosition();
// pick the best measurement from each group
- std::vector<TrajectoryMeasurement> bestMeasurements = findBestMeasurements(*layer, lastTSOS);
+ std::vector<TrajectoryMeasurement> bestMeasurements{};
+
+ double lastdEta{0.f};
+ if (lastTSOS.isValid()) {
+ lastdEta = fabs(lastTSOS.freeTrajectoryState()->momentum().eta() - eta0);
+ bestMeasurements = findBestMeasurements(*layer, lastTSOS);
+ } else {
+ edm::LogInfo(metname) << "Invalid last TSOS, will not find best measurements ";
+ } |
Hi all,
which calls KFUpdator::update and returns an invalid TSOS. A possible workaround could be to skip the recHit whenever this occurs. diff --git a/RecoMuon/TrackingTools/src/MuonTrajectoryUpdator.cc b/RecoMuon/TrackingTools/src/MuonTrajectoryUpdator.cc
index f4afc5ba640..30c3ac95fd3 100644
--- a/RecoMuon/TrackingTools/src/MuonTrajectoryUpdator.cc
+++ b/RecoMuon/TrackingTools/src/MuonTrajectoryUpdator.cc
@@ -156,6 +156,12 @@ pair<bool, TrajectoryStateOnSurface> MuonTrajectoryUpdator::update(const Traject
lastUpdatedTSOS = measurementUpdator()->update(propagatedTSOS, *((*recHit).get()));
+ if (!lastUpdatedTSOS.isValid()) {
+ edm::LogInfo(metname) << "Invalid last TSOS, will skip RecHit ";
+ lastUpdatedTSOS = propagatedTSOS; // Revert update
+ continue;
+ }
+
LogTrace(metname) << " Fit Position : " << lastUpdatedTSOS.globalPosition() I just tested and this modification avoids the crash in both jobs, but implies a slight modification in muon trajectory updator, which may be sensitive. As this condition (returning invalid TSOS) should never happen, I think we can go with it for the moment. But I would like to hear a second opinion on it from CMSSW experts, thanks. |
Thanks @24LopezR |
so this fixes all the 3 instances, (including the one in run-380963 ), right? |
Well, I tested my modification on top of PR #45049. So in principle it can't be safely reverted. |
I did test it without and it didn't crash. |
We have a have a similar segmentation violation in Run 381115 . Here is the tarball with the log: Matteo (ORM) |
Considering that the fix is merged and already included in 14_0_8 then maybe the issue can be closed? Giovanni (ORM) |
+1 |
This issue is fully signed and ready to be closed. |
Closing as completed |
Dear all,
As reported in
https://cms-talk.web.cern.ch/t/segmentation-violation-in-promptreco-for-parkingdoublemuonlowmass1-run-380963/41520
we have a segmentation violation for the ParkingDoubleMuonLowMass dataset in Run 380963, with the following stack trace:
A minimal recipe to reproduce the error is:
The text was updated successfully, but these errors were encountered: