-
Notifications
You must be signed in to change notification settings - Fork 933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert sum/product aggregation to always produce int64_t
type
#14907
Revert sum/product aggregation to always produce int64_t
type
#14907
Conversation
I still don't understand what is the reason that causes performance issue there? Why can't we fix that issue instead of reverting code like this? |
Thank you @ttnghia for your message. I also would prefer to solve the root cause of the issue rather than revert the change. It appears that the degenerate performance is happening in a |
int64_t
typeint64_t
type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine to have this going into 24.02, but we should investigate further for it later. I suspect that it could be some thing messing with the device sum operator working on different operand types.
@ttnghia The |
Wait, from the cuda documentation I see that unsigned int/long are supported? So there maybe something wrong with the groupby code that doesn't call native CUDA |
Thank you @karthikeyann and @ttnghia for this investigation. We would love your help making the groupby code work correctly with |
@karthikeyann's analysis is correct, from what I can tell. I think he was referring to the implementation in cudf/cpp/include/cudf/detail/utilities/device_atomics.cuh Lines 160 to 162 in 5cc021a
That file only implements |
Maybe |
From what I understand:
|
The overloads for float and double just pass through. We should do the same here for
I'm tracing the call chain myself. I think it goes from aggregation::SUM, which calls
I'd be okay with adding an explicit namespace where we intend this to be called. I agree the name conflict is not ideal. |
👍 explicit namespace or snake-case naming is probably the best temporary workaround before the cccl fix. |
OK, thank you everyone for this discussion. If I understand the consensus solution correctly we should:
|
…operators to detail namespace. (#14962) This PR does a thorough refactoring of `device_atomics.cuh`. - I moved all atomic-related functions to `cudf::detail::` (making this an API-breaking change, but most likely a low-impact break) - I added all missing operators for natively supported types to `atomicAdd`, `atomicMin`, `atomicMax`, etc. as discussed in #10149 and #14907. - This should prevent fallback to the `atomicCAS` path for types that are natively supported for those atomic operators, which we suspect as the root cause of the performance regression in #14886. - I kept `atomicAdd` rather than `cudf::detail::atomic_add` in locations where a native CUDA overload exists, and the same for min/max/CAS operations. Aggregations are the only place where we use the special overloads. We were previously calling the native CUDA function rather than our special overloads in many cases, so I retained the previous behavior. This avoids including the additional headers that implement an unnecessary level of wrapping for natively supported overloads. - I enabled native 2-byte CAS operations (on `unsigned short int`) that eliminate the do-while loop and extra alignment-checking logic - The CUDA docs don't state this, but some forum posts claim this is only supported by compute capability 7.0+. We now have 7.0 as a lower bound for RAPIDS so I'm not concerned by this as long as builds/tests pass. - I improved/cleaned the documentation and moved around some code so that the operators were in a logical order. - I assessed the existing tests and it looks like all the types are being covered. I'm not sure if there is a good way to enforce that certain types (like `uint64_t`) are passing through native `atomicAdd` calls. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - David Wendt (https://github.com/davidwendt) - Suraj Aralihalli (https://github.com/SurajAralihalli) URL: #14962
Description
This pull request reverses the modifications made to the sum/product aggregation target type, ensuring it always produces int64. The changes implemented by PR 14679 which led to degraded performance when the aggregation column had an unsigned type, are reverted. Additional details can be found in the issue 14886.
Checklist