-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release dependent modules only after the worker has finished for scheduled modules #39245
Release dependent modules only after the worker has finished for scheduled modules #39245
Conversation
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39245/31880
|
A new Pull Request was created by @makortel (Matti Kortelainen) for master. It involves the following packages:
@cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
enable threading |
@cmsbuild, please test |
@Dr15Jones Would you agree? |
I'm not sure that works for the mixing module which doesn't (I think) do any prefetching. |
Do you mean the cmssw/Mixing/Base/src/PileUp.cc Lines 106 to 109 in 3cac6aa
BMixingModule cmssw/Mixing/Base/interface/BMixingModule.h Lines 108 to 109 in 3cac6aa
? On a first look cmssw/Mixing/Base/src/SecondaryEventProvider.cc Lines 51 to 67 in 3cac6aa
But maybe there is some hidden detail that deserves deeper look. Unfortunately this won't be caught by any tests as the functionality has been unused for some years (and |
-1 Failed Tests: UnitTests RelVals RelVals-THREADING Unit TestsI found errors in the following unit tests: ---> test TestFWCoreFrameworkTrigBit had ERRORS RelValsThe relvals timed out after 4 hours. RelVals-THREADINGThe relvals timed out after 4 hours. |
Well, something gets wrong
|
In addition, specifically HARVESTING step seems to get stuck (because of starvation of tasks in TBB) |
@makortel maybe the code that transitions from one module on a path to the next module on a path is adding to the module proxy's WaitingTaskList and since it is not a 'consumes' relation your change will fail? That it, it requires the release of the WaitingTaskList on the |
edbadba
to
f88c583
Compare
f88c583
to
e8a6e6a
Compare
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39245/32217
|
Pull request #39245 was updated. @cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please check and sign again. |
@cmsbuild, please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-cb5f3c/27739/summary.html Comparison SummarySummary:
|
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2) |
type bug-fix |
+1
|
PR description:
In #39064 (comment) we found that the
PuttableProductResolver::putProduct()
releasing the task(s) for dependent modules can lead to rare problems if the producing module produces two (or more) products that refer to each other. Specifically, a consuming module may get run when only one of the products has been put in place, that can lead to deference of a Ref to fail on "missing product".All consuming modules call the
prefetchAsync_()
, which forPuttableProductResolver
already creates a task to release the dependent modules (m_waitingTasks
), and inserts that into the Worker'sWaitingTaskList
(that is released after the Worker finishes). I think we could just rely on that, and therefore this PR removes the override of. This assumption turned out to be wrong, because there is also another use case forputProduct()
fromPuttableProductResolver
PuttableProductResolver
:PuttableSourceBase
sources (andTestProcessor
, where it is also used like for sources).As a simple way forward to fulfill the requirements of both use cases, this PR makes
PuttableProductResolver
to directly use theWorker
'sWaitingTaskList
in prefetching, if aWorker
has been associated to thePuttableProductResolver
. If noWorker
has been associated, thePuttableProductResolver
assumes it is from a Source (or similar), and the product is available for any consuming module without any further waiting in the prefetching. See also #39245 (comment) for some further discussion.Resolves #39064 .
PR validation:
The reproducer I mentioned in #39064 (comment) succeeds with this change. Framework unit tests pass.