-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in data processing #34835
Comments
A new Issue was created by @kskovpen . @Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
The thread with the crash reports the following back trace
|
assign reconstruction |
@cms-sw/trk-dpg-l2 This is CMSSW_10_6_4_patch1 |
I was trying to find some info about this LS in OMS and for some reason the details there start from LS 889 |
After applying sorting on LS in the web interface, I can see it :) |
https://cmsoms.cern.ch/cms/runs/lumisection?cms_run=322106&cms_run_sequence=GLOBAL-RUN Where are the JSON files these days? my old bookmark to https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/ does not work anymore. |
We've checked that this run and LS are in the official json files. |
Thanks. |
@Dr15Jones @makortel |
Apparently, there have been multiple attempts on recovering this failing job, all resulting in the same issue mentioned above. We are now tracking down the last remaining pieces/issues in the tails of the UL processing. |
Has anyone been able to reproduce this error locally? I when I try to run on what I think is the offending file |
can you provide please instructions to reproduce locally? |
So we were able to successfully reproduce the crash, which was coming specifically from event 196973498. Here is a branch with this simple fix, and I have tested that it does indeed fix the crash. Let me know how everyone would like to proceed. |
The standard procedure is to apply the update in the master and then consider a backport. @kskovpen if we make an update in the software, is the production machinery capable to rerun this LS in a new release? If the recovery in the same target dataset is not possible, then we should at least apply a fix in 10_6_X for possible new campaigns (although I have doubts that we'd have any). |
Ok I went ahead and made a PR for master as we decide on the plan for a backport. PR is #34846 |
Thanks @slava77 for your input. Let me see if @haozturk or @justinasr had such past experience, i.e. would it be possible to rerun the workflow in the updated cmssw release? |
The usual and only procedure from our side (i.e. capabilities of PdmV machinery) would be to reset and resubmit the whole request with new release. If we make a new request to re-run only that run/lumisection, this will end up producing a separate output dataset (like extension) which is probably not desirable. |
No, for crashes we don't report event (or lumi or run) numbers. |
+reconstruction
|
This issue is fully signed and ready to be closed. |
Hello,
We are observing a segmentation violation in one of the data processing workflows. Full log is available here:
https://cms-unified.web.cern.ch/cms-unified/joblogs/haozturk_r-1-Run2018D_EGamma_12Nov2019_UL2018_210804_153732_5282/139/DataProcessing/176bb428-0e57-467d-9216-cbda860fc7c8-0-3-logArchive/job/WMTaskSpace/cmsRun1/cmsRun1-stdout.log.trunc.txt
The issue is also reproducible locally and might be coming from TrackingToolsTrackAssociator. If someone could have a look, please let us know!
PdmV @bbilin @jmartinb, also for @haozturk
The text was updated successfully, but these errors were encountered: