-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix PFClusterSoAProducer to read a device collection #46887
Fix PFClusterSoAProducer to read a device collection #46887
Conversation
Make the PFRecHitSoAProducer produce an additional host-only collection with the number of PF rechits. Make the the PFClusterSoAProducer consume the device collection of PF rechits, and the host collection with the number of PF rechits.
This is an alternative fix to #46830. |
type bugfix |
cms-bot internal usage |
enable gpu |
please test |
A new Pull Request was created by @fwyzard for master. It involves the following packages:
@jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
+1 Size: This PR adds an extra 20KB to repository Comparison SummarySummary:
GPU Comparison SummarySummary:
|
void produce(device::Event& event, const device::EventSetup& setup) override { | ||
event.emplace(pfRecHitsToken_, std::move(*pfRecHits_)); | ||
event.emplace(sizeToken_, *size_); | ||
pfRecHits_.reset(); | ||
|
||
if (synchronise_) | ||
alpaka::wait(event.queue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment of the synchronise
configuration parameter says
Add synchronisation point after execution (for benchmarking asynchronous execution)
so maybe this code should be moved to the end of acquire()
? (as no asynchronous activities are launched from the produce()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhm, ok.
From a technical (and performance) perspective I think this PR is better than #46830 (and is probably good enough for the immediate needs). For long term general pattern I'm concerned of the storing the number of filled elements separately from the |
Next improvements I'd like to try to implement and measure are
The latter will require a bit of work on the SoA infrastructure, though, so it will take some time to happen. |
Here is the impact of this PR on the HCAL+PF reconstruction, measured on a machine with 2× AMD Bergamo CPUs and 4× NVIDIA L4 GPUs. baseline
with this PR
i.e. a > 10% speed up. |
@fwyzard for the first improvement, do you plan to include it already in this PR or later in another PR? Thanks |
I cannot work on it this week, so it could go in a separate PR. |
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @rappoccio, @mandrenguyen, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
PR description:
Fix
PFClusterSoAProducer
to read a device collection instead of a host collection, when running on a GPU backend.Make the
PFRecHitSoAProducer
produce an additional host-only collection with the number of PF rechits. ThePFClusterSoAProducer
can then consume the device collection of PF rechits, and the host collection with the number of PF rechits.PR validation:
Tested that the 2024 HLT menu runs.
If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:
May be backported to 14.2.x or earlier if there is interest.