-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Two relvals failing in Geometry #32181
Comments
assign geometry |
New categories assigned: geometry @Dr15Jones,@cvuosalo,@mdhildreth,@makortel,@ianna,@civanch you have been requested to review this Pull request/Issue and eventually sign? Thanks |
A new Issue was created by @mrodozov Mircho Rodozov. @Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
The error (from 11624.911) is
|
Perhaps the error in ASAN might be helpful https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_amd64_gcc820/CMSSW_11_2_ASAN_X_2020-11-18-2300/pyRelValMatrixLogs/run/11624.911_TTbar_13+2021_DD4hep+TTbar_13TeV_TuneCUETP8M1_GenSim+Digi+Reco+HARVEST+ALCA/step1_TTbar_13+2021_DD4hep+TTbar_13TeV_TuneCUETP8M1_GenSim+Digi+Reco+HARVEST+ALCA.log#/
|
Meanwhile I've found something weird
|
In CMSSW_11_2_X_2020-11-18-2300, I ran:
They run all steps successfully and do not show the errors mentioned in this issue. |
Note that in the IB, 11624.911 shows this error:
while 11642.911 shows this one:
Could they be spurious? Could we really have two different real errors that somehow do not show up in the PR tests or direct runTheMatrix tests? |
One clear difference between PR and IB tests is that PR tests are single-thread and IB tests are multi-threaded. |
Right, the PR tests are ok indeed #31220 (comment) |
The command used in the IB test is |
@cvuosalo @bsunanda is it ok this line ? It consumes
|
I ran the workflows with "-t 4" to allow multi-threading. One ran to completion successfully, and one crashed in step3, with a different message than before. |
@cvuosalo I confirm, in single threaded mode the workflow runs smoothly, in multi-thread I got a failure in step2 due to the evaluation of a constant in dd4hep. It looks like the issue depends on the memory access of a single job, this would explain why it does not seem to be systematically reproducible. |
Can we run these two workflows in the IB tests in single-threaded mode for now? |
The PR tests run single thread, so we are checking DD4HEP in each PR and this does not create any problem. |
#32249 suggests a culprit for the failures with multiple threads |
The DD4Hep workflows seem also to generate large number of differences (possibly randomly) in PR tests, see e.g. #32270 (comment). Should we consider removing them from the PR tests until they become more stable? |
Please do that. DD4Hep workflows are still being debugged. They should not be in the comparison lists.
…________________________________
From: Matti Kortelainen [[email protected]]
Sent: 25 November 2020 14:51
To: cms-sw/cmssw
Cc: Sunanda Banerjee; Mention
Subject: Re: [cms-sw/cmssw] Two relvals failing in Geometry (#32181)
The DD4Hep workflows seem also to generate large number of differences (possibly randomly) in PR tests, see e.g. #32270 (comment)<#32270 (comment)>. Should we consider removing them from the PR tests until they become more stable?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#32181 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABGMZOUXZPKX4FHGGNA6GOTSRUDVTANCNFSM4T3JZUIQ>.
|
The ASAN failure in #32181 (comment) reproduces when run with single thread. Maybe that would be a good starting point for further debugging? (also #32181 (comment) and #32181 (comment)) |
In CMSSW_11_3_X_2020-12-08-1100, I have confirmed that workflows 11624.911 and 11642.911 run successfully to completion with 1000 events in both single- and multi-threaded mode. |
I think the ASAN problem comes from the dd4hep plugin manager trying to use the type of the |
Dears,
in the last night IB there are two dd4hep failures
https://cmssdt.cern.ch/SDT/html/cmssdt-ib/#/relVal/CMSSW_11_2/2020-11-18-2300?selectedArchs=slc7_amd64_gcc820&selectedFlavors=X&selectedStatus=failed
the relvals failing were added in:
#32096
The text was updated successfully, but these errors were encountered: