Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in PATLeptonTimeLifeInfoProducer.cc in CMSSW_14_0_6 #44862

Closed
mandrenguyen opened this issue Apr 29, 2024 · 22 comments
Closed

Crash in PATLeptonTimeLifeInfoProducer.cc in CMSSW_14_0_6 #44862

mandrenguyen opened this issue Apr 29, 2024 · 22 comments

Comments

@mandrenguyen
Copy link
Contributor

In a replay of CMSSW_14_0_6 a crash was reported here:
https://cms-talk.web.cern.ch/t/replay-request-for-cmssw-14-0-6/39939/4

This crash occurs in
https://cmssdt.cern.ch/lxr/source/PhysicsTools/PatAlgos/plugins/PATLeptonTimeLifeInfoProducer.cc
and it happens at the following line:

GlobalPoint pca = closestState.globalPosition();

One can reproduce the problem directly by executing PSet.py from the CMS talk post above, and adding the following line:
process.source.eventsToProcess = cms.untracked.VEventRange("369998:31680062")

As this issue is blocking the deployment of a release with important bug-fixes for the HLT, the issue is urgent, and any help would be highly appreciated.

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 29, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

A new Issue was created by @mandrenguyen.

@antoniovilela, @sextonkennedy, @makortel, @rappoccio, @Dr15Jones, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@mandrenguyen
Copy link
Contributor Author

assign reconstruction

@cmsbuild
Copy link
Contributor

New categories assigned: reconstruction

@jfernan2,@mandrenguyen you have been requested to review this Pull request/Issue and eventually sign? Thanks

@mandrenguyen
Copy link
Contributor Author

@cms-sw/egamma-pog-l2 @cms-sw/tracking-pog-l2
Maybe one of you would have some insight.

@francescobrivio
Copy link
Contributor

simple recipe to reproduce:

cmsrel CMSSW_14_0_6
cd CMSSW_14_0_6/src
cmsenv
cp /afs/cern.ch/user/c/cmst0/public/PausedJobs/Replay14_0_6/job_8586/job6/WMTaskSpace/cmsRun1/PSet.p* .

update the PSet with:

cat <<EOF >> PSet.py
    process.source.eventsToProcess = cms.untracked.VEventRange("369998:31680062")
    process.options.numberOfThreads=cms.untracked.uint32(1)
    process.options.numberOfStreams=cms.untracked.uint32(1)
EOF

run:

cmsRun PSet.py 

@mmusich
Copy link
Contributor

mmusich commented Apr 29, 2024

@mbluj please take a look

@mmusich
Copy link
Contributor

mmusich commented Apr 29, 2024

urgent

@mmusich
Copy link
Contributor

mmusich commented Apr 29, 2024

I guess the easiest is to check if the closestState is valid before accessing any of its members.

@mandrenguyen
Copy link
Contributor Author

I guess the easiest is to check if the closestState is valid before accessing any of its members.

I confirm that adding the following before the offending line allows the event to process successfully:
if(!closestState.isValid()) return;

I will let others comment on whether that's an acceptable solution.

@francescobrivio
Copy link
Contributor

I guess the easiest is to check if the closestState is valid before accessing any of its members.

I confirm that adding the following before the offending line allows the event to process successfully: if(!closestState.isValid()) return;

I will let others comment on whether that's an acceptable solution.

In case experts agree this is the best solution, here is a commit (to master) that can be quickly cherry-picked:
francescobrivio@c150b8b

It emits this edm error:

%MSG-e PATLeptonTimeLifeInfoProducer:   PATElectronTimeLifeInfoProducer:electronTimeLifeInfos  29-Apr-2024 11:56:53 CEST Run: 369998 Event: 31680062
closestState not valid!
%MSG

@mmusich
Copy link
Contributor

mmusich commented Apr 29, 2024

Out of curiosity what's

transTrack.impactPointState()

in the event that leads to the crash?

@francescobrivio
Copy link
Contributor

I'm getting:

  impactPointState: 
global parameters
x =      0.116433    -0.181856     0.937492
p =       3.90438      0.18625      10.0547
global error
  0.000478242  8.74249e-07  8.21233e-05 -0.000239124 -1.87666e-05
  8.74249e-07  1.17157e-07  2.75897e-07 -1.21146e-06 -1.43013e-06
  8.21233e-05  2.75897e-07  1.75494e-05  -5.8975e-05 -5.83673e-06
 -0.000239124 -1.21146e-06  -5.8975e-05  0.000221227  2.56773e-05
 -1.87666e-05 -1.43013e-06 -5.83673e-06  2.56773e-05  2.10979e-05
local parameters (q/p,v',w',v,w)
   -0.0926974            0      2.57232            0            0
local error
  0.000478242  8.20722e-05  6.65901e-06 -0.000239124 -5.17932e-05
  8.20722e-05  1.75178e-05  2.07178e-06 -5.89051e-05 -1.59499e-05
  6.65901e-06  2.07178e-06  6.79698e-06 -9.22749e-06 -3.00633e-05
 -0.000239124 -5.89051e-05 -9.22749e-06  0.000221227  7.08658e-05
 -5.17932e-05 -1.59499e-05 -3.00633e-05  7.08658e-05  0.000160699
Defined at beforeSurface
Magnetic field in inverse GeV:  (-6.68797e-10,1.04459e-09,0.0114257) 

and from RecoVertex::convertPos(pv.position()) I get:

 convertPos:  (0.117078,-0.182911,0.95493) 

@vlimant
Copy link
Contributor

vlimant commented Apr 29, 2024

assign xpog

@cmsbuild
Copy link
Contributor

New categories assigned: xpog

@vlimant,@hqucms you have been requested to review this Pull request/Issue and eventually sign? Thanks

@slava77
Copy link
Contributor

slava77 commented Apr 29, 2024

global parameters
x = 0.116433 -0.181856 0.937492
p = 3.90438 0.18625 10.0547
convertPos: (0.117078,-0.182911,0.95493)

it's not clear why this state and target would fail propagation

@mmusich
Copy link
Contributor

mmusich commented Apr 29, 2024

mmmh,
I am getting different numbers using the recipe at #44862 (comment)

diff --git a/PhysicsTools/PatAlgos/plugins/PATLeptonTimeLifeInfoProducer.cc b/PhysicsTools/PatAlgos/plugins/PATLeptonTimeLifeInfoProducer.cc
index 2e41063e3f2..f68730271ea 100644
--- a/PhysicsTools/PatAlgos/plugins/PATLeptonTimeLifeInfoProducer.cc
+++ b/PhysicsTools/PatAlgos/plugins/PATLeptonTimeLifeInfoProducer.cc
@@ -167,6 +167,8 @@ void PATLeptonTimeLifeInfoProducer<T>::produceAndFillIPInfo(const T& lepton,
     // Extrapolate track to the point closest to PV
     reco::TransientTrack transTrack = transTrackBuilder.build(track);
     AnalyticalImpactPointExtrapolator extrapolator(transTrack.field());
+
+    std::cout << __PRETTY_FUNCTION__ << " " << transTrack.impactPointState() << std::endl;
     TrajectoryStateOnSurface closestState =
         extrapolator.extrapolate(transTrack.impactPointState(), RecoVertex::convertPos(pv.position()));
     GlobalPoint pca = closestState.globalPosition();
void PATLeptonTimeLifeInfoProducer<T>::produceAndFillIPInfo(const T&, const TransientTrackBuilder&, const reco::Vertex&, TrackTimeLifeInfo&) [with T = pat::Electron] global parameters
x =       1.91865      34.2722      166.593
p =     -0.016365  0.000857716    0.0626035
global error
     0.812786   0.00612206   -0.0855089   -0.0504526    0.0847113
   0.00612206   0.00027459 -0.000491722  0.000820261   0.00174473
   -0.0855089 -0.000491722    0.0124496   0.00858109   -0.0126903
   -0.0504526  0.000820261   0.00858109    0.0129749  -0.00277318
    0.0847113   0.00174473   -0.0126903  -0.00277318    0.0202798
local parameters (q/p,v',w',v,w)
     -15.4529            0     -3.82022            0           -0
local error
     0.812786    0.0291878   -0.0981561    0.0504526    -0.334519
    0.0291878   0.00453957    0.0104463   0.00673732   0.00313124
   -0.0981561    0.0104463    0.0685205    0.0127032     0.109981
    0.0504526   0.00673732    0.0127032    0.0129749   -0.0109511
    -0.334519   0.00313124     0.109981   -0.0109511     0.316245
Defined at beforeSurface
Magnetic field in inverse GeV:  (1.92692e-06,3.442e-05,0.0112619) 

@slava77
Copy link
Contributor

slava77 commented Apr 29, 2024

x =       1.91865      34.2722      166.593
p =     -0.016365  0.000857716    0.0626035

this one make more sense to possibly fail a prop to PCA.

@francescobrivio
Copy link
Contributor

Yea sorry, I do get the same numbers as Marco indeed! Not sure what I was printing exactly...
I'll update my PR with the printouts as @slava77 suggested.

@mbluj
Copy link
Contributor

mbluj commented May 6, 2024

Hello, I was completely off last week and I am reading it only now. Thank you for fixing the issue.

@mandrenguyen
Copy link
Contributor Author

+1
We can consider this solved by #44864

@francescobrivio
Copy link
Contributor

+1 We can consider this solved by #44864

Thanks Matt! For completeness this was solved by #44864 + #44875! (and the combined backport is #44869)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants