-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix PFClusterSoAProducer to read a device collection #46830
Fix PFClusterSoAProducer to read a device collection #46830
Conversation
enable gpu |
please test |
cms-bot internal usage |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46830/42850 |
A new Pull Request was created by @fwyzard for master. It involves the following packages:
@jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
cb06dce
to
c0252e9
Compare
please test |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46830/42851 |
Pull request #46830 was updated. @jfernan2, @mandrenguyen can you please check and sign again. |
please hold |
OK, the proposed fix cannot work, because we need to know the number of rechits on the host: if (pfRecHits->metadata().size() != 0)
nRH = pfRecHits->size(); |
I guess this is a common enough pattern that we should find a general solution 🤔 |
-1 Failed Tests: RelVals-GPU RelVals-GPU
Comparison SummarySummary:
|
please test |
please unhold |
assign heterogeneous |
@makortel what do you think ? |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46830/42852 |
Pull request #46830 was updated. @fwyzard, @jfernan2, @makortel, @mandrenguyen can you please check and sign again. |
+1 Size: This PR adds an extra 32KB to repository Comparison SummarySummary:
GPU Comparison SummarySummary:
|
type bug-fix |
Looks reasonable for a (better) workaround.
Agreed, how about a new GitHub issue on this topic? (IIRC in the pixel code the approach was to allocate memory and launch kernels based on the capacity of the containers rather than the "number of elements", and the "number of elements" was used only on the device code to terminate the loops early) |
+1 |
See #46887 for an alternative (and hopefully better) approach. |
Here is the impact of this PR on the HCAL+PF reconstruction, measured on a machine with 2× AMD Bergamo CPUs and 4× NVIDIA L4 GPUs. baseline
with this PR
i.e. a 10% speed up. |
Closing in favour of #46887. |
PR description:
Fix
PFClusterSoAProducer
to read a device collection instead of a host collection, when running on a GPU backend.Note:this is a quick workaround to let the device code use the device collection, while being able to access the actual number of pf rechits on the host side. It should replaced with a better and more general implementation, and the use of the host collection should be removed.
PR validation:
Full 2024 HLT menu works with these changes, both on CPU and on GPU.
If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:
May be backported to 14.2.x or earlier if there is interest.