-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA]: Add Thrust build option to disable dynamic offset type dispatch #1958
Comments
We chose to throw in the cuDF patch but I don't think we rely on this behavior. This patch only applies to cuDF, not all of RAPIDS, and I think we have safety guarantees from cuDF's |
CMake-interface-wise I would opt for a single tri-state option, e.g.
UB is really the worst option, because it leaves bugs on the userside undetected. Since Thrust uses mostly random access and contiguous iteartors and we already compute the distance in many cases when calling the CUB API, the cost is negligible and I strongly advocate for detecting overflow and erroring out. Whether an exception is the right tool is a different question, since we communicate CUDA errors using error codes. However, (I think) allocation failures are already reported by throwing |
Thrust throws exceptions today already, so this isn't a problem. My preference is to throw as well, but wanted to leave it open for someone to disagree. |
As far as I'm concerned, this build option is a stop-gap solution and the ergonomics aren't very important to me. The long term solution will be for people to specify the desired offset type via the execution policy, but that will be more involved for us to implement. |
The current dispatch mechanisms trades compile time and binary size for performance and flexibility. Allow users to tune that depending on their needs Fixes NVIDIA#1958
Is this a duplicate?
Area
Thrust
Is your feature request related to a problem? Please describe.
Today, Thrust algorithms introspects the size of the input sequence to perform a dynamic dispatch between two independent instantiations of the same kernel. The only difference between the kernels is the type used for the "offset type", or the type that is used for index calculations into the input sequence.
This is done for a balance between correctness and performance.
For correctness, Thrust needs to be able to handle input sequences larger than what can be represented by
int
, e.g.,numeric_limits<int>::max()
akaINT_MAX
equal to2^31 - 1
. For a sequence of 4B integers, this would be ~8.5GB. That's big, but by no means unreasonable with the size of GPU memory these days.This means Thrust's kernels (CUB) need to be able to handle indexing into sequences larger than
INT_MAX
. This means using a 64 bit integer type likeint64_t
oruint64_t
instead of a 32 bit integer type likeint32_t
oruint32_t
. This small change can have a big performance impact on some algorithms/kernels. Therefore, we couldn't just switch everything usingint
to useint64_t
without potentially causing significant performance regressions for existing users.For more detail, see: #47
Thrust's dynamic dispatch to the two kernel instantiations can negatively impact downstream user's build times and binary sizes. This has lead to projects like RAPIDS cuDF to patch CCCL source code as part of their build process to disable this dispatch.
Describe the solution you'd like
Thrust should expose a build option like
THRUST_FORCE_64BIT_OFFSET_TYPE
andTHRUST_FORCE_32BIT_OFFSET_TYPE
that would be functionality equivalent to the patch that RAPIDS applies today. It will eliminate the dual instantiation and dynamic dispatch by only instantiating the kernel with the specified offset type.One question I wasn't sure about:
THRUST_FORCE_32BIT_OFFSET_TYPE
and you pass an input sequence where `distance(begin,end) >= (2<<31 - 1)?Describe alternatives you've considered
No response
Additional context
This will provide no added benefits over what RAPIDS is doing today other than the fact that RAPIDS no longer has to maintain a CCCL patch, which simplifies CCCL's testing of RAPIDS in our CI.
The text was updated successfully, but these errors were encountered: