-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Alpaka ESProducers asynchronous for non-host backends #47034
Conversation
…ductDirectly The value depends only on the ALPAKA_ACCELERATOR_NAMESPACE::Platform, and is independent of the data product type.
…ctDirectly The value depends only on the types of ALPAKA_ACCELERATOR_NAMESPACE::{Platform,Queue}, and is independent of the data product type.
cms-bot internal usage |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47034/43157 |
A new Pull Request was created by @makortel for master. It involves the following packages:
@cmsbuild, @fwyzard, @makortel can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
enable gpu |
@cmsbuild, please test |
static_assert(std::is_same_v<std::remove_cv_t<std::remove_reference_t<decltype(deviceRecord.queue())>>, | ||
alpaka::Queue<Device, alpaka::Blocking>>, | ||
"Non-blocking queue when trying to use ES data product directly. This might indicate a " | ||
"need to extend the Alpaka ESProducer base class."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could also just do
static_assert(std::is_same_v<std::remove_cv_t<std::remove_reference_t<decltype(deviceRecord.queue())>>, | |
alpaka::Queue<Device, alpaka::Blocking>>, | |
"Non-blocking queue when trying to use ES data product directly. This might indicate a " | |
"need to extend the Alpaka ESProducer base class."); | |
alpaka::wait(deviceRecord.queue()); |
instead, but I'd expect a CPU backend with a non-blocking Queue to have other consequences as well, so I thought a static_assert
might be useful.
// Given that the ESProducers are expected to be mostly | ||
// for host-to-device data copies, that are serialized | ||
// anyway (at least on current NVIDIA), this should be | ||
// reasonable behavior for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively the Queues could be stored in an std::vector
like before, that is then moved from the "acquire" lambda to be "produce" lambda.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
Queue
is returned to theQueueCache
. The sameQueue
may be used for other work before the work enqueued here finishes.
Isn't this already a common behaviour with queues in the QueueCache
?
Or, in the other cases, queues are returned only once all work has been completed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
Queue
is returned to theQueueCache
. The sameQueue
may be used for other work before the work enqueued here finishes.Isn't this already a common behaviour with queues in the
QueueCache
? Or, in the other cases, queues are returned only once all work has been completed ?
In the Event system case the queues are returned only after all work has been completed. This happens because the shared_ptr<Queue>
are held in the Event data products, and they return the queue only after synchronization. (although I'd say this behavior is more of a result of other design decisions than being a deliberate choice by itself)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes. It's the EventCache
that has additional provisions to avoid reusing events that are not yet "ready".
IMHO these current changes are fine, because Event Setup transitions are supposed to be "rare" with respect to the Event work.
If we later notice any performance impact here, we can always add back the vector
, or extend the QueueCache
to only return queues that are "empty".
+1 Size: This PR adds an extra 40KB to repository Comparison SummarySummary:
GPU Comparison SummarySummary:
|
496947a
to
cff5e8b
Compare
(tiny simplification) |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47034/43162
|
(though you could also squash the last commit) |
This pull request is fully signed and it will be integrated in one of the next master IBs after it passes the integration tests. This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @antoniovilela, @rappoccio, @mandrenguyen (and backports should be raised in the release meeting by the corresponding L2) |
2cb5b05
to
24181b3
Compare
Done. |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47034/43184
|
@cmsbuild, please test |
+1 Size: This PR adds an extra 20KB to repository Comparison SummarySummary:
GPU Comparison SummarySummary:
|
+1 |
PR description:
This PR makes Alpaka ESProducers asynchronous, i.e. replaces the existing blocking
alpaka::wait()
call with a non-blocking synchronization (unless the newly-addedalpaka.synchronize
parameter #47028 is set toTrue
).While doing that, I got annoyed by the present
useESProductDirectly<T>
helper, because its value does not really depend onT
. So I simplified its definition (first commit), and for consistency also the Event-system-sideuseProductDirectly
(second commit) to be independent ofT
. In the Event-system case I also replaced the#ifdef
with the use of type traits.Resolves cms-sw/framework-team#1126
PR validation:
Unit tests pass on nodes with and without NVIDIA GPU.