-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CMSSW_12_4_11 did not fix GSF tracking issue in prompt reco #39987
Comments
assign @cms-sw/egamma-pog-l2 |
A new Issue was created by @rappoccio . @Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign reconstruction, egamma-pog |
New categories assigned: reconstruction,egamma-pog @mandrenguyen,@lfinco,@clacaputo,@swagata87 you have been requested to review this Pull request/Issue and eventually sign? Thanks |
as a clarification, there is an issue, but it's of different nature than the other one (before an assertion was triggered, while here there is a segmentation fault) |
There doesn't seem to be much room for a place to crash: BasicMultiTrajectoryState::BasicMultiTrajectoryState(const
I would suggest to edit GaussianStateConversions::tsosFromMultiGaussianState @swagata87 could you check? |
or is there a good reason to make |
something like: diff --git a/TrackingTools/GsfTracking/src/TsosGaussianStateConversions.cc b/TrackingTools/GsfTracking/src/TsosGaussianStateConversions.cc
index 1692614f233..44fd569c8b4 100644
--- a/TrackingTools/GsfTracking/src/TsosGaussianStateConversions.cc
+++ b/TrackingTools/GsfTracking/src/TsosGaussianStateConversions.cc
@@ -50,6 +50,6 @@ namespace GaussianStateConversions {
side);
}
}
- return TrajectoryStateOnSurface((BasicTrajectoryState*)new BasicMultiTrajectoryState(components));
+ return components.empty() ? TrajectoryStateOnSurface() : TrajectoryStateOnSurface((BasicTrajectoryState*)new BasicMultiTrajectoryState(components));
}
} // namespace GaussianStateConversions ? |
Thank you Slava and Marco. I could not reproduce the actual crash yet. So I tried to simulate the crash... Basically, I tried to check what happens if no state pass the following loop in cmssw/TrackingTools/GsfTracking/src/GsfMultiStateUpdator.cc Lines 34 to 40 in fae4959
by doing the following: --- a/TrackingTools/GsfTracking/src/GsfMultiStateUpdator.cc
+++ b/TrackingTools/GsfTracking/src/GsfMultiStateUpdator.cc
@@ -36,7 +36,7 @@ TrajectoryStateOnSurface GsfMultiStateUpdator::update(const TrajectoryStateOnSur
(det = 0., updatedTSOS.curvilinearError().matrix().Sub<AlgebraicSymMatrix22>(0, 0).Det(det) && det > 0) &&
(det = 0., updatedTSOS.curvilinearError().matrix().Sub<AlgebraicSymMatrix33>(0, 0).Det(det) && det > 0) &&
(det = 0., updatedTSOS.curvilinearError().matrix().Sub<AlgebraicSymMatrix44>(0, 0).Det(det) && det > 0) &&
- (det = 0., updatedTSOS.curvilinearError().matrix().Det2(det) && det > 0)) {
+ (det = 0., updatedTSOS.curvilinearError().matrix().Det2(det) && det > 0 && det<0)) {
result.addState(TrajectoryStateOnSurface(weights[i],
The above condition, that Then I rolled back cmssw/TrackingTools/GsfTracking/src/TsosGaussianStateConversions.cc Lines 42 to 45 in 57a8f0a
by requiring the same det>0 && det<0 :
--- a/TrackingTools/GsfTracking/src/TsosGaussianStateConversions.cc
+++ b/TrackingTools/GsfTracking/src/TsosGaussianStateConversions.cc
@@ -41,7 +41,7 @@ namespace GaussianStateConversions {
components.reserve(singleStates.size());
for (auto const& ic : singleStates) {
//require states to be positive-definite
- if (double det = 0; (*ic).covariance().Det2(det) && det > 0) {
+ if (double det = 0; (*ic).covariance().Det2(det) && det > 0 && det<0) {
components.emplace_back((*ic).weight(), This time it crashed at the first event. Then I added the patch suggested by Slava and Marco to
and ran again. But this time also, it crashed at the first event. |
where was the crash this time? |
this time it crashed like this:
|
From technical POV, it looks like I have found a way to bypass the (simulated) crash. If we use your patch in TsosGaussianStateConversions, plus the following change in --- a/TrackingTools/GsfTracking/src/GsfTrajectoryFitter.cc
+++ b/TrackingTools/GsfTracking/src/GsfTrajectoryFitter.cc
@@ -94,7 +94,7 @@ Trajectory GsfTrajectoryFitter::fitOne(const TrajectorySeed& aSeed,
//
// temporary protection copied from KFTrajectoryFitter.
//
- if ((**ihit).isValid() == false && (**ihit).det() == nullptr) {
+ if ((**ihit).isValid() == false || (**ihit).det() == nullptr) {
LogDebug("GsfTrackFitters") << " Error: invalid hit with no GeomDet attached .... skipping";
continue;
}
@@ -121,6 +121,9 @@ Trajectory GsfTrajectoryFitter::fitOne(const TrajectorySeed& aSeed,
//update
assert((!(*ihit)->canImproveWithTrack()) | (nullptr != theHitCloner));
assert((!(*ihit)->canImproveWithTrack()) | (nullptr != dynamic_cast<BaseTrackerRecHit const*>((*ihit).get())));
+ if (!predTsos.isValid()) {
+ return Trajectory();
+ }
auto preciseHit = theHitCloner->makeShared(*ihit, predTsos);
dump(*preciseHit, hitcounter, "GsfTrackFitters");
currTsos = updator()->update(predTsos, *preciseHit);
then it runs. Currently tested on 20 events only. We still need to check/understand,
|
I see the diff --git a/TrackingTools/GsfTracking/src/TsosGaussianStateConversions.cc b/TrackingTools/GsfTracking/src/TsosGaussianStateConversions.cc
index 1692614f233..bf77e61ae83 100644
--- a/TrackingTools/GsfTracking/src/TsosGaussianStateConversions.cc
+++ b/TrackingTools/GsfTracking/src/TsosGaussianStateConversions.cc
@@ -41,7 +41,7 @@ namespace GaussianStateConversions {
components.reserve(singleStates.size());
for (auto const& ic : singleStates) {
//require states to be positive-definite
- if (double det = 0; (*ic).covariance().Det2(det) && det > 0) {
+ if (double det = 0; (*ic).covariance().Det2(det) && det > 0 && det < 0) {
components.emplace_back((*ic).weight(),
LocalTrajectoryParameters((*ic).mean(), pzSign, charged),
LocalTrajectoryError((*ic).covariance()), as far as I can tell, it's not the same crash as the one reported by Tier-0: How can we be sure we're curing the right problem? (by the way I still couldn't reproduce offline the crash seen in the replay). |
Hi Marco, I agree and share your concern that there is no guarantee that we are fixing the actual issue that T0 encountered. Real crash in replay (copied from T0 cmsTalk)
Simulated crash
|
so IIUC the "whole" proposal ( I tested that it avoids the crash, at least in the "simulated" setup) would be: diff --git a/TrackingTools/GsfTracking/src/GsfTrajectoryFitter.cc b/TrackingTools/GsfTracking/src/GsfTrajectoryFitter.cc
index 7fa6e49f4e3..3b154349315 100644
--- a/TrackingTools/GsfTracking/src/GsfTrajectoryFitter.cc
+++ b/TrackingTools/GsfTracking/src/GsfTrajectoryFitter.cc
@@ -121,6 +121,9 @@ Trajectory GsfTrajectoryFitter::fitOne(const TrajectorySeed& aSeed,
//update
assert((!(*ihit)->canImproveWithTrack()) | (nullptr != theHitCloner));
assert((!(*ihit)->canImproveWithTrack()) | (nullptr != dynamic_cast<BaseTrackerRecHit const*>((*ihit).get())));
+ if (!predTsos.isValid()) {
+ return Trajectory();
+ }
auto preciseHit = theHitCloner->makeShared(*ihit, predTsos);
dump(*preciseHit, hitcounter, "GsfTrackFitters");
currTsos = updator()->update(predTsos, *preciseHit);
diff --git a/TrackingTools/GsfTracking/src/TsosGaussianStateConversions.cc b/TrackingTools/GsfTracking/src/TsosGaussianStateConversions.cc
index 1692614f233..44fd569c8b4 100644
--- a/TrackingTools/GsfTracking/src/TsosGaussianStateConversions.cc
+++ b/TrackingTools/GsfTracking/src/TsosGaussianStateConversions.cc
@@ -50,6 +50,6 @@ namespace GaussianStateConversions {
side);
}
}
- return TrajectoryStateOnSurface((BasicTrajectoryState*)new BasicMultiTrajectoryState(components));
+ return components.empty() ? TrajectoryStateOnSurface() : TrajectoryStateOnSurface((BasicTrajectoryState*)new BasicMultiTrajectoryState(components));
}
} // namespace GaussianStateConversions I guess we can open a PR to check for regressions. |
Thank you Marco. If you take care of preparing the PR then in the meanwhile I can do all the sanity checks I was planning to do, to make sure there is no adverse effect anywhere in e/gamma reconstruction due to these changes. |
I think we should try to reproduce on |
|
Hello, it's ORM here, |
from the log, it doesn't look related. |
based on the stack trace:
it seems it's coming from https://github.com/cms-externals/fastjet-contrib/blob/283910e44f2c3c81133fc68c8f4942b9c53da6e3/Nsubjettiness/Njettiness.cc#L179-L204 I would suggest opening a different issue (and pinging a different set of experts) |
#40017 is now merged in master: should we prepare a backport to the data taking releases 12_4_X and 12_5_X, and test it if it finally fixes this issue or not? |
okay, I will do the backports soon |
urgent |
There was another instance of the same issue (
But I'd like the experts to test it as well.
edit the
run:
|
Is the stack trace the same as in plain 12_4_11? |
good to have a reproducible crash.. at least one can debug this the following extra protection solves this crash for me. Can anyone else confirm please? --- a/TrackingTools/GsfTracking/src/GsfTrajectorySmoother.cc
+++ b/TrackingTools/GsfTracking/src/GsfTrajectorySmoother.cc
@@ -129,6 +129,10 @@ Trajectory GsfTrajectorySmoother::trajectory(const Trajectory& aTraj) const {
if (theMerger)
predTsos = theMerger->merge(predTsos);
+ if (!predTsos.isValid()) {
+ return Trajectory();
+ }
+
if ((*itm).recHit()->isValid()) {
//update
currTsos = updator()->update(predTsos, *(*itm).recHit());
|
note that the above extra protection in |
stack traces are slightly different (even tho i'm not an expert at all):
12_4_11 + #40063
|
They look substantially different to me, the last one points to |
looking at cmssw/TrackingTools/GsfTracking/src/GsfTrajectorySmoother.cc Lines 204 to 207 in df3006f
we can add this before
|
so although I missed the ORP yesterday, I have heard that a new 12_4_X is imminent. if there are objections/other requests/different plan to proceed; we can discuss later of course. |
in retrospect, I'm curious why we did not see all these crashes before. I'm not sure from the past if we had all the production crash reports percolated to cmssw issues. My reco memory for the past 10 years was that only T0 crashes were followed systematically and the production was less so. |
I have the same curiosity. In my understanding low pT electron reco is new (only in UL 2018? So perhaps only well calibrated data). Also from anecdotal evidence it seems that the amount of gsf warnings is proportional to the "age" of conditions used (newer conditions seem to trigger less errors). I didn't study it in details yet though. |
For the record, there was another instance of this crash. This is the 4th instance in prompt reco, afaik. It was in run From private communication with German Giraldo, I got the tarball, which is here: I added the following lines to
I could reproduce the crash in 12_4_10 (the current release T0 is in).
In 12_4_11_patch1, the crash goes away. |
It looks like we can close this now, thanks everyone! |
Details are spelled out here for the crash. This was hoped to be solved with 12_4_11 but the issue is still present.
The text was updated successfully, but these errors were encountered: