Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow up to the Alpaka Implementation of PFClusterProducer #43501

Open
fwyzard opened this issue Dec 5, 2023 · 7 comments
Open

Follow up to the Alpaka Implementation of PFClusterProducer #43501

fwyzard opened this issue Dec 5, 2023 · 7 comments

Comments

@fwyzard
Copy link
Contributor

fwyzard commented Dec 5, 2023

The goal of this issue is to collect feedback and action items on the Alpaka Implementation of PFClusterProducer, to be followed up after the integration of #43130.

elements_in_block_with_stride

We should implement a helper function elements_in_block_with_stride(acc, extent) to make all the threads in a group loop and cover extent elements, with a stride equal to the group size.

Then, update the loop in TopoClusterContraction to use elements_in_block_with_stride instead of

      for (int rhIdx = alpaka::getIdx<alpaka::Block, alpaka::Threads>(acc)[0u]; rhIdx < nRH;
           rhIdx += alpaka::getWorkDiv<alpaka::Block, alpaka::Threads>(acc)[0u]) {

single precision literals

The use of double precision literals (e.g. 0.5 in expf(-0.5 * value)) force the compiler to convert the operands from single to double precision, compute the result in double precision, and convert them to back single precision.
Given the cost of double precision operations on the "small" GPUs like NVIDIA T4 or Intel Flex, these conversions and temporary operations in double precision should be avoided, by explicitly marking the floating point literals as single precision:

expf(-0.5f * value);

avoid square roots where possible

Given

float cut = ...;
float value2 = ...;
float value = sqrtf(value2);
if (value > cut) { ...}

we should avoid the square root and use the square of cut:

float cut = ...;
float value2 = ...;
float cut2 = cut * cut;
if (value2 > cut2) { ...}
@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 5, 2023

A new Issue was created by @fwyzard Andrea Bocci.

@Dr15Jones, @makortel, @sextonkennedy, @rappoccio, @antoniovilela, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 5, 2023

type pf

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 5, 2023

assign heterogeneous

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 5, 2023

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@jsamudio
Copy link
Contributor

I opened #43574 to address reading the thresholds from GT. Is there any changes in particular you would want bundled in that particular PR?

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 15, 2023

I think you have already address the second and third point (using single precision literals and avoid square roots) - if not, you could do that.

For the first point we still need to provide the helper function.

@jsamudio
Copy link
Contributor

I believe avoiding the square roots was implemented in the original PR, and I just updated expf functions in the latest commit for #43574. So probably at this point everything but the first point is addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants