Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add thrust_create_target DISPATCH option. #2844

Merged
merged 9 commits into from
Nov 19, 2024
6 changes: 5 additions & 1 deletion ci/matrix.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ workflows:
- {jobs: ['build'], std: 'all', ctk: '12.5', cxx: 'nvhpc'}
- {jobs: ['build'], std: 'all', cxx: ['gcc', 'clang'], cpu: 'arm64'}
- {jobs: ['build'], std: 'all', cxx: ['gcc'], sm: '90a'}
# Test Thrust 32-bit-only dispatch here, since it's most likely to break. 64-bit-only is tested in nightly.
- {jobs: ['test_gpu'], project: 'thrust', cmake_options: '-DTHRUST_DISPATCH_TYPE=Force32bit'}
# default_projects: clang-cuda
- {jobs: ['build'], std: 'all', cudacxx: 'clang', cxx: 'clang'}
- {jobs: ['build'], project: 'libcudacxx', std: 'all', cudacxx: 'clang', cxx: 'clang', sm: '90'}
Expand Down Expand Up @@ -58,13 +60,15 @@ workflows:
- {jobs: ['infra'], project: 'cccl', ctk: 'curr', cxx: ['gcc', 'clang']}

nightly:
# Increased test coverage compared to nightlies:
# Increased test coverage compared to pull_request:
- {jobs: ['test'], std: 'all', cxx: ['gcc13', 'clang18', 'msvc2022']}
- {jobs: ['test'], project: 'cudax', ctk: ['12.0', 'curr'], std: 'all', cxx: ['gcc12']}
- {jobs: ['test'], project: 'cudax', ctk: ['12.0' ], std: 'all', cxx: ['clang14']}
- {jobs: ['test'], project: 'cudax', ctk: [ 'curr'], std: 'all', cxx: ['clang18']}
# Edge-case jobs
- {jobs: ['limited'], project: 'cub', std: 17}
- {jobs: ['test_gpu'], project: 'thrust', cmake_options: '-DTHRUST_DISPATCH_TYPE=Force32bit'}
- {jobs: ['test_gpu'], project: 'thrust', cmake_options: '-DTHRUST_DISPATCH_TYPE=Force64bit'}

# # These are waiting on the NVKS nodes:
# - {jobs: ['test'], ctk: '11.1', gpu: 'v100', sm: 'gpu', cxx: 'gcc6', std: [11]}
Expand Down
45 changes: 34 additions & 11 deletions lib/cmake/thrust/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,22 @@ If using Thrust from the CCCL sources, this would be
$ cmake . -DThrust_DIR=<CCCL git repo root>/thrust/thrust/cmake/
```

#### Large Array (64-bit offseet) Handling: `DISPATCH`

The `DISPATCH` option allows users to select the tradeoff of compile-time / binary-size
vs. performance vs. scalability when given large inputs that require 64-bit offset types.
This currently only applies when DEVICE=CUDA.

- `Dynamic` Compiles each kernel twice, once for 32-bit offsets and again for 64-bit offsets,
and chooses dynamically using the input size at runtime.
alliepiper marked this conversation as resolved.
Show resolved Hide resolved
This significantly increases compile-time and binary-size, but provides optimal performance
for small input sizes while also supporting 64-bit indexed workloads.
- `Force32bit` forces Thrust to use a 32 bit offset type. This improves compile time and
binary size but limits the input size.
- `Force64bit` forces Thrust to use a 64-bit offset type. This improves compile time and
binary size and allows large input sizes. However, it may degrade runtime performance
for 32-bit indexed workloads.

#### TBB / OpenMP

To explicitly specify host/device systems, `HOST` and `DEVICE` arguments can be
Expand All @@ -56,33 +72,40 @@ host system, but will find and use TBB or OpenMP for the device system.

To allow a Thrust target to be configurable easily via `cmake-gui` or
`ccmake`, pass the `FROM_OPTIONS` flag to `thrust_create_target`. This will add
`THRUST_HOST_SYSTEM` and `THRUST_DEVICE_SYSTEM` options to the CMake cache that
allow selection from the systems supported by this version of Thrust.
`THRUST_HOST_SYSTEM`, `THRUST_DEVICE_SYSTEM`, and `THRUST_DISPATCH_TYPE` options
to the CMake cache that allow selection from the systems supported by this version
of Thrust.

```cmake
thrust_create_target(Thrust FROM_OPTIONS
[HOST_OPTION <option name>]
[DEVICE_OPTION <option name>]
[DISPATCH_OPTION <option name>]
[HOST_OPTION_DOC <doc string>]
[DEVICE_OPTION_DOC <doc string>]
[DISPATCH_OPTION_DOC <doc string>]
[HOST <default host system name>]
[DEVICE <default device system name>]
[DISPATCH <default dispatch type>]
[ADVANCED]
)
```

The optional arguments have sensible defaults, but may be configured per
`thrust_create_target` call:

| Argument | Default | Description |
|---------------------|-------------------------|---------------------------------|
| `HOST_OPTION` | `THRUST_HOST_SYSTEM` | Name of cache option for host |
| `DEVICE_OPTION` | `THRUST_DEVICE_SYSTEM` | Name of cache option for device |
| `HOST_OPTION_DOC` | Thrust's host system. | Docstring for host option |
| `DEVICE_OPTION_DOC` | Thrust's device system. | Docstring for device option |
| `HOST` | `CPP` | Default host system |
| `DEVICE` | `CUDA` | Default device system |
| `ADVANCED` | *N/A* | Mark cache options advanced |
| Argument | Default | Description |
|-----------------------|-------------------------|-----------------------------------|
| `HOST_OPTION` | `THRUST_HOST_SYSTEM` | Name of cache option for host |
| `DEVICE_OPTION` | `THRUST_DEVICE_SYSTEM` | Name of cache option for device |
| `DISPATCH_OPTION` | `THRUST_DISPATCH_TYPE` | Name of cache option for dispatch |
| `HOST_OPTION_DOC` | Thrust's host system. | Docstring for host option |
| `DEVICE_OPTION_DOC` | Thrust's device system. | Docstring for device option |
| `DISPATCH_OPTION_DOC` | Thrust's dispatch type. | Docstring for dispatch option |
| `HOST` | `CPP` | Default host system |
| `DEVICE` | `CUDA` | Default device system |
| `DISPATCH` | `Dispatch` | Default dispatch type |
| `ADVANCED` | *N/A* | Mark cache options advanced |

### Specifying Thrust Version Requirements

Expand Down
50 changes: 42 additions & 8 deletions lib/cmake/thrust/thrust-config.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@
#
# Provided by NVIDIA under the same license as the associated Thrust library.
#
# Reply-To: Allison Vacanti <[email protected]>
#
# *****************************************************************************
# ** The following is a short reference to using Thrust from CMake. **
# ** For more details, see the README.md in the same directory as this file. **
Expand All @@ -30,10 +28,13 @@
# thrust_create_target(TargetName FROM_OPTIONS
# [HOST_OPTION <option_name>] # Optionally rename the host system option
# [DEVICE_OPTION <option_name>] # Optionally rename the device system option
# [DISPATCH_OPTION <option_name>] # Optionally rename the dispatch system option
# [HOST_OPTION_DOC <doc_string>] # Optionally change the cache label
# [DEVICE_OPTION_DOC <doc_string>] # Optionally change the cache label
# [DISPATCH_OPTION_DOC <doc_str>] # Optionally change the cache label
# [HOST <default system>] # Optionally change the default backend
# [DEVICE <default system>] # Optionally change the default backend
# [DISPATCH <default dispatch>] # Optionally change the default dispatch
# [ADVANCED] # Optionally mark options as advanced
# [GLOBAL] # Optionally mark the target as GLOBAL
# )
Expand All @@ -59,6 +60,11 @@
# IGNORE_CUB_VERSION # Skip configure-time and compile-time CUB version checks
# )
#
# # DISPATCH options (See README):
# thrust_create_target(TargetName DISPATCH Dynamic)
# thrust_create_target(TargetName DISPATCH Force32bit)
# thrust_create_target(TargetName DISPATCH Force64bit)
#
# # Test if a particular system has been loaded. ${var_name} is set to TRUE or
# # FALSE to indicate if "system" is found.
# thrust_is_system_found(<system> <var_name>)
Expand Down Expand Up @@ -100,6 +106,11 @@ set(THRUST_DEVICE_SYSTEM_OPTIONS
CACHE INTERNAL "Valid Thrust device systems"
FORCE
)
set(THRUST_DISPATCH_TYPE_OPTIONS
Dynamic Force32bit Force64bit
CACHE INTERNAL "Valid Thrust dispatch types"
FORCE
)

# Workaround cmake issue #20670 https://gitlab.kitware.com/cmake/cmake/-/issues/20670
# Legacy all-caps THRUST variables:
Expand Down Expand Up @@ -137,6 +148,9 @@ function(thrust_create_target target_name)
HOST
HOST_OPTION
HOST_OPTION_DOC
DISPATCH
DISPATCH_OPTION
DISPATCH_OPTION_DOC
)
cmake_parse_arguments(TCT "${options}" "${keys}" "" ${ARGN})
if (TCT_UNPARSED_ARGUMENTS)
Expand All @@ -158,10 +172,13 @@ function(thrust_create_target target_name)

_thrust_set_if_undefined(TCT_HOST CPP)
_thrust_set_if_undefined(TCT_DEVICE CUDA)
_thrust_set_if_undefined(TCT_DISPATCH Dynamic)
_thrust_set_if_undefined(TCT_HOST_OPTION THRUST_HOST_SYSTEM)
_thrust_set_if_undefined(TCT_DEVICE_OPTION THRUST_DEVICE_SYSTEM)
_thrust_set_if_undefined(TCT_HOST_OPTION_DOC "Thrust host system.")
_thrust_set_if_undefined(TCT_DEVICE_OPTION_DOC "Thrust device system.")
_thrust_set_if_undefined(TCT_DISPATCH_OPTION THRUST_DISPATCH_TYPE)
_thrust_set_if_undefined(TCT_HOST_OPTION_DOC "Thrust host system: ${THRUST_HOST_SYSTEM_OPTIONS}")
_thrust_set_if_undefined(TCT_DEVICE_OPTION_DOC "Thrust device system: ${THRUST_DEVICE_SYSTEM_OPTIONS}")
_thrust_set_if_undefined(TCT_DISPATCH_OPTION_DOC "Thrust dispatch type: ${THRUST_DISPATCH_TYPE_OPTIONS}")

if (NOT TCT_HOST IN_LIST THRUST_HOST_SYSTEM_OPTIONS)
message(FATAL_ERROR
Expand All @@ -175,18 +192,26 @@ function(thrust_create_target target_name)
)
endif()

if (NOT TCT_DISPATCH IN_LIST THRUST_DISPATCH_TYPE_OPTIONS)
message(FATAL_ERROR
"Requested DISPATCH=${TCT_DISPATCH}; must be one of ${THRUST_DISPATCH_TYPE_OPTIONS}"
)
endif()

if (TCT_FROM_OPTIONS)
_thrust_create_cache_options(
${TCT_HOST} ${TCT_DEVICE}
${TCT_HOST_OPTION} ${TCT_DEVICE_OPTION}
${TCT_HOST_OPTION_DOC} ${TCT_DEVICE_OPTION_DOC}
${TCT_HOST} ${TCT_DEVICE} ${TCT_DISPATCH}
${TCT_HOST_OPTION} ${TCT_DEVICE_OPTION} ${TCT_DISPATCH_OPTION}
${TCT_HOST_OPTION_DOC} ${TCT_DEVICE_OPTION_DOC} ${TCT_DISPATCH_OPTION_DOC}
${TCT_ADVANCED}
)
set(TCT_HOST ${${TCT_HOST_OPTION}})
set(TCT_DEVICE ${${TCT_DEVICE_OPTION}})
set(TCT_DISPATCH ${${TCT_DISPATCH_OPTION}})
thrust_debug("Current option settings:" internal)
thrust_debug(" - ${TCT_HOST_OPTION}=${TCT_HOST}" internal)
thrust_debug(" - ${TCT_DEVICE_OPTION}=${TCT_DEVICE}" internal)
thrust_debug(" - ${TCT_DISPATCH_OPTION}=${TCT_DISPATCH}" internal)
endif()

_thrust_find_backend(${TCT_HOST} REQUIRED)
Expand All @@ -206,6 +231,12 @@ function(thrust_create_target target_name)
Thrust::${TCT_DEVICE}::Device
)

if (${TCT_DISPATCH} STREQUAL "Force32bit")
target_compile_definitions(${target_name} INTERFACE "THRUST_FORCE_32_BIT_OFFSET_TYPE")
elseif(${TCT_DISPATCH} STREQUAL "Force64bit")
target_compile_definitions(${target_name} INTERFACE "THRUST_FORCE_64_BIT_OFFSET_TYPE")
endif()

# This would be nice to enforce, but breaks when using old cmake + new
# compiler, since cmake doesn't know what features the new compiler version
# supports.
Expand Down Expand Up @@ -416,14 +447,17 @@ function(_thrust_declare_interface_alias alias_name ugly_name)
endfunction()

# Create cache options for selecting the user/device systems with ccmake/cmake-gui.
function(_thrust_create_cache_options host device host_option device_option host_doc device_doc advanced)
function(_thrust_create_cache_options host device dispatch host_option device_option dispatch_option host_doc device_doc dispatch_doc advanced)
thrust_debug("Creating system cache options: (advanced=${advanced})" internal)
thrust_debug(" - Host Option=${host_option} Default=${host} Doc='${host_doc}'" internal)
thrust_debug(" - Device Option=${device_option} Default=${device} Doc='${device_doc}'" internal)
thrust_debug(" - Dispatch Option=${dispatch_option} Default=${dispatch} Doc='${dispatch_doc}'" internal)
set(${host_option} ${host} CACHE STRING "${host_doc}")
set_property(CACHE ${host_option} PROPERTY STRINGS ${THRUST_HOST_SYSTEM_OPTIONS})
set(${device_option} ${device} CACHE STRING "${device_doc}")
set_property(CACHE ${device_option} PROPERTY STRINGS ${THRUST_DEVICE_SYSTEM_OPTIONS})
set(${dispatch_option} ${dispatch} CACHE STRING "${dispatch_doc}")
set_property(CACHE ${dispatch_option} PROPERTY STRINGS ${THRUST_DISPATCH_TYPE_OPTIONS})
if (advanced)
mark_as_advanced(${host_option} ${device_option})
endif()
Expand Down
2 changes: 1 addition & 1 deletion thrust/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ option(THRUST_ENABLE_TESTING "Build Thrust testing suite." "ON")
option(THRUST_ENABLE_EXAMPLES "Build Thrust examples." "ON")

# Allow the user to optionally select offset type dispatch to fixed 32 or 64 bit types
set(THRUST_DISPATCH_TYPE "Dynamic" CACHE STRING "Select Thrust offset type dispatch." FORCE)
set(THRUST_DISPATCH_TYPE "Dynamic" CACHE STRING "Select Thrust offset type dispatch.")
set_property(CACHE THRUST_DISPATCH_TYPE PROPERTY STRINGS "Dynamic" "Force32bit" "Force64bit")

# Check if we're actually building anything before continuing. If not, no need
Expand Down
6 changes: 0 additions & 6 deletions thrust/cmake/ThrustBuildCompilerTargets.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,6 @@ function(thrust_build_compiler_targets)
append_option_if_available("/wd4146" cxx_compile_options)
endif()

if (THRUST_DISPATCH_TYPE STREQUAL "Force32bit")
list(APPEND cxx_compile_definitions "THRUST_FORCE_32_BIT_OFFSET_TYPE")
elseif (THRUST_DISPATCH_TYPE STREQUAL "Force64bit")
list(APPEND cxx_compile_definitions "THRUST_FORCE_64_BIT_OFFSET_TYPE")
endif()

cccl_build_compiler_interface(thrust.compiler_interface
"${cuda_compile_options}"
"${cxx_compile_options}"
Expand Down
1 change: 1 addition & 0 deletions thrust/cmake/ThrustBuildTargetList.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,7 @@ function(_thrust_build_target_list_multiconfig)
thrust_create_target(${target_name}
HOST ${host}
DEVICE ${device}
DISPATCH ${THRUST_DISPATCH_TYPE}
${THRUST_TARGET_FLAGS}
)

Expand Down
7 changes: 0 additions & 7 deletions thrust/cmake/ThrustHeaderTesting.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -154,13 +154,6 @@ foreach(thrust_target IN LISTS THRUST_TARGETS)
"CUB_WRAPPED_NAMESPACE=wrapped_cub")
thrust_add_header_test(${thrust_target} wrap "${header_definitions}")

# We need to ensure that the different dispatch mechanisms work
set(header_definitions "THRUST_FORCE_32_BIT_OFFSET_TYPE")
thrust_add_header_test(${thrust_target} offset_32 "${header_definitions}")

set(header_definitions "THRUST_FORCE_64_BIT_OFFSET_TYPE")
thrust_add_header_test(${thrust_target} offset_64 "${header_definitions}")

thrust_get_target_property(config_device ${thrust_target} DEVICE)
if ("CUDA" STREQUAL "${config_device}")
# Check that BF16 support can be disabled
Expand Down
4 changes: 4 additions & 0 deletions thrust/testing/copy.cu
Original file line number Diff line number Diff line change
Expand Up @@ -668,6 +668,8 @@ void TestCopyIfStencilDispatchImplicit()
}
DECLARE_UNITTEST(TestCopyIfStencilDispatchImplicit);

#ifndef THRUST_FORCE_32_BIT_OFFSET_TYPE

struct only_set_when_expected_it
{
long long expected;
Expand Down Expand Up @@ -752,3 +754,5 @@ void TestCopyWithBigIndexes()
TestCopyWithBigIndexesHelper(33);
}
DECLARE_UNITTEST(TestCopyWithBigIndexes);

#endif
2 changes: 2 additions & 0 deletions thrust/testing/count.cu
Original file line number Diff line number Diff line change
Expand Up @@ -119,8 +119,10 @@ void TestCountWithBigIndexesHelper(int magnitude)
void TestCountWithBigIndexes()
{
TestCountWithBigIndexesHelper(30);
#ifndef THRUST_FORCE_32_BIT_OFFSET_TYPE
TestCountWithBigIndexesHelper(31);
TestCountWithBigIndexesHelper(32);
TestCountWithBigIndexesHelper(33);
#endif
}
DECLARE_UNITTEST(TestCountWithBigIndexes);
2 changes: 2 additions & 0 deletions thrust/testing/cuda/adjacent_difference.cu
Original file line number Diff line number Diff line change
Expand Up @@ -153,8 +153,10 @@ void TestAdjacentDifferenceWithBigIndexesHelper(int magnitude)
void TestAdjacentDifferenceWithBigIndexes()
{
TestAdjacentDifferenceWithBigIndexesHelper(30);
#ifndef THRUST_FORCE_32_BIT_OFFSET_TYPE
TestAdjacentDifferenceWithBigIndexesHelper(31);
TestAdjacentDifferenceWithBigIndexesHelper(32);
TestAdjacentDifferenceWithBigIndexesHelper(33);
#endif
}
DECLARE_UNITTEST(TestAdjacentDifferenceWithBigIndexes);
3 changes: 3 additions & 0 deletions thrust/testing/cuda/partition.cu
Original file line number Diff line number Diff line change
Expand Up @@ -677,9 +677,12 @@ void TestPartitionIfWithMagnitude(int magnitude)
void TestPartitionIfWithLargeNumberOfItems()
{
TestPartitionIfWithMagnitude(30);
// These require 64-bit dispatches even when magnitude < 32.
# ifndef THRUST_FORCE_32_BIT_OFFSET_TYPE
alliepiper marked this conversation as resolved.
Show resolved Hide resolved
TestPartitionIfWithMagnitude(31);
TestPartitionIfWithMagnitude(32);
TestPartitionIfWithMagnitude(33);
# endif
}
DECLARE_UNITTEST(TestPartitionIfWithLargeNumberOfItems);
#endif
Expand Down
2 changes: 2 additions & 0 deletions thrust/testing/cuda/reduce_by_key.cu
Original file line number Diff line number Diff line change
Expand Up @@ -433,8 +433,10 @@ void TestReduceByKeyWithBigIndexesHelper(int magnitude)
void TestReduceByKeyWithBigIndexes()
{
TestReduceByKeyWithBigIndexesHelper(30);
#ifndef THRUST_FORCE_32_BIT_OFFSET_TYPE
TestReduceByKeyWithBigIndexesHelper(31);
TestReduceByKeyWithBigIndexesHelper(32);
TestReduceByKeyWithBigIndexesHelper(33);
#endif
}
DECLARE_UNITTEST(TestReduceByKeyWithBigIndexes);
7 changes: 6 additions & 1 deletion thrust/testing/cuda/sort.cu
Original file line number Diff line number Diff line change
Expand Up @@ -251,9 +251,14 @@ void TestSortWithMagnitude(int magnitude)

void TestSortWithLargeNumberOfItems()
{
TestSortWithMagnitude(39);
TestSortWithMagnitude(30);
// These still require 64-bit dispatches when magnitude < 32.
#ifndef THRUST_FORCE_32_BIT_OFFSET_TYPE
TestSortWithMagnitude(31);
TestSortWithMagnitude(32);
TestSortWithMagnitude(33);
TestSortWithMagnitude(39);
#endif
}
DECLARE_UNITTEST(TestSortWithLargeNumberOfItems);

Expand Down
2 changes: 2 additions & 0 deletions thrust/testing/inner_product.cu
Original file line number Diff line number Diff line change
Expand Up @@ -131,9 +131,11 @@ void TestInnerProductWithBigIndexesHelper(int magnitude)
void TestInnerProductWithBigIndexes()
{
TestInnerProductWithBigIndexesHelper(30);
#ifndef THRUST_FORCE_32_BIT_OFFSET_TYPE
TestInnerProductWithBigIndexesHelper(31);
TestInnerProductWithBigIndexesHelper(32);
TestInnerProductWithBigIndexesHelper(33);
#endif
}
DECLARE_UNITTEST(TestInnerProductWithBigIndexes);

Expand Down
2 changes: 2 additions & 0 deletions thrust/testing/max_element.cu
Original file line number Diff line number Diff line change
Expand Up @@ -104,8 +104,10 @@ void TestMaxElementWithBigIndexesHelper(int magnitude)
void TestMaxElementWithBigIndexes()
{
TestMaxElementWithBigIndexesHelper(30);
#ifndef THRUST_FORCE_32_BIT_OFFSET_TYPE
TestMaxElementWithBigIndexesHelper(31);
TestMaxElementWithBigIndexesHelper(32);
TestMaxElementWithBigIndexesHelper(33);
#endif
}
DECLARE_UNITTEST(TestMaxElementWithBigIndexes);
2 changes: 2 additions & 0 deletions thrust/testing/min_element.cu
Original file line number Diff line number Diff line change
Expand Up @@ -102,8 +102,10 @@ void TestMinElementWithBigIndexesHelper(int magnitude)
void TestMinElementWithBigIndexes()
{
TestMinElementWithBigIndexesHelper(30);
#ifndef THRUST_FORCE_32_BIT_OFFSET_TYPE
TestMinElementWithBigIndexesHelper(31);
TestMinElementWithBigIndexesHelper(32);
TestMinElementWithBigIndexesHelper(33);
#endif
}
DECLARE_UNITTEST(TestMinElementWithBigIndexes);
Loading
Loading