-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PromptReco failure PromptReco_Run381379_ParkingSingleMuon4 #45162
Comments
cms-bot internal usage |
A new Issue was created by @Dr15Jones. @antoniovilela, @sextonkennedy, @smuzaffar, @makortel, @rappoccio, @Dr15Jones can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
The job can be run by setting up a CMSSW_14_0_7 area, downloading the tarball (which is at Then after untarring go to directory job/WMTaskSpace/cmsRun1 and then do
|
There appear to be lots of extraneous exceptions being thrown (and caught) in this job. The first one encountered is
Which is caught here cmssw/RecoTracker/FinalTrackSelectors/plugins/SingleLongTrackProducer.cc Lines 158 to 173 in dbbd44f
which is problematic as the tracks are the |
assign tracking |
The next group of exceptions come from
the exception originates here cmssw/TrackingTools/TrajectoryState/src/PerigeeConversions.cc Lines 15 to 16 in dbbd44f
and is caught here
|
assign reconstruction |
New categories assigned: reconstruction @jfernan2,@mandrenguyen you have been requested to review this Pull request/Issue and eventually sign? Thanks |
By skipping the first events, I was able to get to the trackback for the exception which ultimately ended the job
|
assign root |
@pcanal how can we understand better what happened during the read? |
type root |
type tracking |
that's just looks like a poorly written code, where try/catch is used instead of checking for trackExtra to be present. cmssw/RecoTracker/FinalTrackSelectors/plugins/SingleLongTrackProducer.cc Lines 133 to 136 in dbbd44f
a proper copy is made conditionally, while the rest in |
@borzari Please clarify if you are available to check this. |
Hi @slava77 I applied what you suggested in this commit, used the opportunity to remove some duplicated code, and tested it with RelValZMM and RelValTTbar events by comparing the version with try/catch results with the version with the validity check results. Everything worked as intended and no changes to the output were observed, as expected. Just to clarify two points:
|
I misread the TrackExtraBase; So, I would add this Even though in the current setup a track without an extra is enough, there can still be cases where |
Out of curiosity why is that? Can't the |
Hi @mmusich
The hit checks are to make sure that this track won't have missing layers with measurement, which is not 100% effective as I already showed during the presentations about this topic, but also doesn't impact a lot on the final result because it doesn't happen so often. I wouldn't think changing that part of the code for Here I added the suggestions from @slava77. Again, I tested with RelValZMM and RelValTTbar events, and things are working as expected. If you don't have other suggestions, I can open a PR with it and we can continue the discussion there |
Exactly, can't you do that before filling the vector? Default constructed tracks can't be used for refit. |
Alright, so instead of only getting the track with the smallest |
Right, this is what I had in mind. |
It didn't work. If I move the validity check from the rechits/hitpattern check to where I select tracks (I did
|
Isn't |
Should be. I implemented it like Slava suggested here Could it be that, although I am adding only tracks with valid |
The check I used was
|
Alright @Dr15Jones, but does it happens every time I am using a Well, in any case, I would suggest to open a PR with these changes. At least to remove the |
maybe I am missing something, but with CMSTrackingPOG@5318549 on top of borzari@95ecc4b I can run this test:
(even using the whole input file) without crashes. |
@mmusich most probably I was missing something. The main differences I see (besides the better organization of the code in the way you wrote), is that I included |
@mmusich I started from your branch and tested what I mentioned above:
May I start a PR to include your changes and the |
here it is: #45213. I used the CMSTrackingPOG VO so you should be able to push more commits if necessary. |
Great! I don't think there are any other modifications that are needed. Just FYI, I also checked the output DQM histograms of that branch using RelValZMM events and they are the same as before the changes, as expected |
@Dr15Jones Is my understanding correct? |
@slava77 I'm on vacation until Thursday. The try/catch fixes were only there to make it easier to get to the underlying problem in the debugger. It does look like the underlying problem is in ROOT. |
Coming back to problem itself, in https://cms-talk.web.cern.ch/t/paused-job-for-promptreco-run381379-parkingsinglemuon4/42082/7 the likely cause was mentioned to be a corrupted file. I suppose there were no further similar failures? Under the file corruption hypothesis, maybe we could just close the issue? |
+1 |
This issue is fully signed and ready to be closed. |
@cmsbuild, please close |
From https://cms-talk.web.cern.ch/t/paused-job-for-promptreco-run381379-parkingsinglemuon4/42082
The tarball can be found here:
/afs/cern.ch/user/c/cmst0/public/PausedJobs/Run2024E/FileReadError/job/WMTaskSpace/cmsRun1
From the logs it seems to crash at event 1742503164. The error is reproducible locally.
The text was updated successfully, but these errors were encountered: