-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Follow up to the Alpaka Implementation of PFClusterProducer #43501
Comments
A new Issue was created by @fwyzard Andrea Bocci. @Dr15Jones, @makortel, @sextonkennedy, @rappoccio, @antoniovilela, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
type pf |
assign heterogeneous |
I opened #43574 to address reading the thresholds from GT. Is there any changes in particular you would want bundled in that particular PR? |
I think you have already address the second and third point (using single precision literals and avoid square roots) - if not, you could do that. For the first point we still need to provide the helper function. |
I believe avoiding the square roots was implemented in the original PR, and I just updated |
The goal of this issue is to collect feedback and action items on the Alpaka Implementation of
PFClusterProducer
, to be followed up after the integration of #43130.elements_in_block_with_stride
We should implement a helper function
elements_in_block_with_stride(acc, extent)
to make all the threads in a group loop and coverextent
elements, with a stride equal to the group size.Then, update the loop in
TopoClusterContraction
to useelements_in_block_with_stride
instead ofsingle precision literals
The use of double precision literals (e.g.
0.5
inexpf(-0.5 * value)
) force the compiler to convert the operands from single to double precision, compute the result in double precision, and convert them to back single precision.Given the cost of double precision operations on the "small" GPUs like NVIDIA T4 or Intel Flex, these conversions and temporary operations in double precision should be avoided, by explicitly marking the floating point literals as single precision:
avoid square roots where possible
Given
we should avoid the square root and use the square of
cut
:The text was updated successfully, but these errors were encountered: