-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the thrust dispatch mechanisms configurable #2310
Conversation
The current dispatch mechanisms trades compile time and binary size for performance and flexibility. Allow users to tune that depending on their needs Fixes NVIDIA#1958
Co-authored-by: Jake Hemstad <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also add a test that verifies that an algorithm invocation will throw as expected. Something simple like this:
#define THRUST_FORCE_32BIT_OFFSET_TYPE
try{
thrust::reduce( thrust::counting_iterator<size_t>(0), thrust::counting_iterator<size_t>( std::numeric_limits<int32_t>::max() + 1));
} catch (std::exception const& e) {
// expect this to throw
}
}
#define THRUST_FORCE_64BIT_OFFSET_TYPE
try{
thrust::reduce( thrust::counting_iterator<size_t>(0), thrust::counting_iterator<size_t>( std::numeric_limits<int32_t>::max() + 1));
} catch (std::exception const& e) {
// expect this to *NOT* throw
}
}
06930d7
to
6c2efe1
Compare
3e6b83b
to
b28c913
Compare
+ std::to_string(thrust::detail::integer_traits<index_type>::const_max) \ | ||
+ "). " #index_type " was used because the macro THRUST_FORCE_32BIT_OFFSET_TYPE was defined. " \ | ||
"To handle larger input sizes, either remove this macro to dynamically dispatch " \ | ||
"between 32-bit and 64-bit index types, or define THRUST_FORCE_64BIT_OFFSET_TYPE.") \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There’s not much the user can do if they encounter this at runtime unless they rebuild the application. I wonder whether this error should be written to target developers or end users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think that we should target this to a specific audience. In the end it does not matter whether it is an end user or developer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miscco Would you like me to test this in libcudf?
That would be awesome, I can build everything but dont know how to measure the binary size / runtim stuff |
🟨 CI finished in 6h 46m: Pass: 99%/250 | Total: 1d 20h | Avg: 10m 44s | Max: 1h 33m | Hits: 79%/24375
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
+/- | Thrust |
CUDA Experimental | |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | pycuda |
🏃 Runner counts (total jobs: 250)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
41 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
🟩 CI finished in 15h 32m: Pass: 100%/250 | Total: 1d 20h | Avg: 10m 47s | Max: 1h 33m | Hits: 79%/24375
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
+/- | Thrust |
CUDA Experimental | |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | pycuda |
🏃 Runner counts (total jobs: 250)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
41 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
@@ -59,6 +59,10 @@ option(THRUST_ENABLE_TESTING "Build Thrust testing suite." "ON") | |||
option(THRUST_ENABLE_EXAMPLES "Build Thrust examples." "ON") | |||
option(THRUST_ENABLE_BENCHMARKS "Build Thrust runtime benchmarks." "${CCCL_ENABLE_BENCHMARKS}") | |||
|
|||
# Allow the user to optionally select offset type dispatch to fixed 32 or 64 bit types | |||
set(THRUST_DISPATCH_TYPE "Dynamic" CACHE STRING "Select Thrust offset type dispatch." FORCE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FORCE
here means that a user can't override the value
@@ -59,6 +59,10 @@ option(THRUST_ENABLE_TESTING "Build Thrust testing suite." "ON") | |||
option(THRUST_ENABLE_EXAMPLES "Build Thrust examples." "ON") | |||
option(THRUST_ENABLE_BENCHMARKS "Build Thrust runtime benchmarks." "${CCCL_ENABLE_BENCHMARKS}") | |||
|
|||
# Allow the user to optionally select offset type dispatch to fixed 32 or 64 bit types | |||
set(THRUST_DISPATCH_TYPE "Dynamic" CACHE STRING "Select Thrust offset type dispatch." FORCE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option is not visible to users consuming our CMake packages (e.g. CPM, find_package(CCCL)
, add_subdirectory(...)
, etc). They're only visible in the developer build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this option to be usable when consuming CCCL from libcudf. That was the purpose of introducing this, in #1958.
set(header_definitions | ||
"THRUST_WRAPPED_NAMESPACE=wrapped_thrust" | ||
"CUB_WRAPPED_NAMESPACE=wrapped_cub" | ||
"THRUST_FORCE_32_BIT_OFFSET_TYPE") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These header tests are linked to the interface targets built in ThrustBuildCompilerTargets.cmake
, which already may have these options set, potentially to conflicting values.
The current dispatch mechanisms trades compile time and binary size for performance and flexibility.
Allow users to tune that depending on their needs
Fixes #1958