Skip to content

Commit

Permalink
Add thrust_create_target DISPATCH option. (NVIDIA#2844)
Browse files Browse the repository at this point in the history
* Remove the THRUST_DISPATCH_TYPE header tests.

These will end up specifying conflicting flags when `THRUST_DISPATCH_TYPE` is set to something other than `Dynamic`.

* Add `DISPATCH` option to `thrust_create_target`.

```
thrust_create_target(TargetName
  DISPATCH [Dynamic|Force32bit|Force64bit]
```

* Skip 64-bit offset tests when forcing 32-bit dispatch.

* Add 32/64-bit dispatch jobs to nightly CI.

* Add 32-bit dispatch to pull_request workflow.
  • Loading branch information
alliepiper authored and trxcllnt committed Nov 23, 2024
1 parent 7682680 commit 4b2da5f
Show file tree
Hide file tree
Showing 22 changed files with 125 additions and 35 deletions.
6 changes: 5 additions & 1 deletion ci/matrix.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ workflows:
- {jobs: ['build'], std: 'all', ctk: '12.5', cxx: 'nvhpc'}
- {jobs: ['build'], std: 'all', cxx: ['gcc', 'clang'], cpu: 'arm64'}
- {jobs: ['build'], std: 'all', cxx: ['gcc'], sm: '90a'}
# Test Thrust 32-bit-only dispatch here, since it's most likely to break. 64-bit-only is tested in nightly.
- {jobs: ['test_gpu'], project: 'thrust', cmake_options: '-DTHRUST_DISPATCH_TYPE=Force32bit'}
# default_projects: clang-cuda
- {jobs: ['build'], std: 'all', cudacxx: 'clang', cxx: 'clang'}
- {jobs: ['build'], project: 'libcudacxx', std: 'all', cudacxx: 'clang', cxx: 'clang', sm: '90'}
Expand Down Expand Up @@ -58,13 +60,15 @@ workflows:
- {jobs: ['infra'], project: 'cccl', ctk: 'curr', cxx: ['gcc', 'clang']}

nightly:
# Increased test coverage compared to nightlies:
# Increased test coverage compared to pull_request:
- {jobs: ['test'], std: 'all', cxx: ['gcc13', 'clang18', 'msvc2022']}
- {jobs: ['test'], project: 'cudax', ctk: ['12.0', 'curr'], std: 'all', cxx: ['gcc12']}
- {jobs: ['test'], project: 'cudax', ctk: ['12.0' ], std: 'all', cxx: ['clang14']}
- {jobs: ['test'], project: 'cudax', ctk: [ 'curr'], std: 'all', cxx: ['clang18']}
# Edge-case jobs
- {jobs: ['limited'], project: 'cub', std: 17}
- {jobs: ['test_gpu'], project: 'thrust', cmake_options: '-DTHRUST_DISPATCH_TYPE=Force32bit'}
- {jobs: ['test_gpu'], project: 'thrust', cmake_options: '-DTHRUST_DISPATCH_TYPE=Force64bit'}

# # These are waiting on the NVKS nodes:
# - {jobs: ['test'], ctk: '11.1', gpu: 'v100', sm: 'gpu', cxx: 'gcc6', std: [11]}
Expand Down
45 changes: 34 additions & 11 deletions lib/cmake/thrust/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,22 @@ If using Thrust from the CCCL sources, this would be
$ cmake . -DThrust_DIR=<CCCL git repo root>/thrust/thrust/cmake/
```

#### Large Array (64-bit offseet) Handling: `DISPATCH`

The `DISPATCH` option allows users to select the tradeoff of compile-time / binary-size
vs. performance vs. scalability when given large inputs that require 64-bit offset types.
This currently only applies when DEVICE=CUDA.

- `Dynamic` May compile each kernel twice, once for 32-bit offsets and again for 64-bit
offsets, and choose dynamically using the input size at runtime.
This significantly increases compile-time and binary-size, but provides optimal performance
for small input sizes while also supporting 64-bit indexed workloads.
- `Force32bit` forces Thrust to use a 32 bit offset type. This improves compile time and
binary size but limits the input size.
- `Force64bit` forces Thrust to use a 64-bit offset type. This improves compile time and
binary size and allows large input sizes. However, it may degrade runtime performance
for 32-bit indexed workloads.

#### TBB / OpenMP

To explicitly specify host/device systems, `HOST` and `DEVICE` arguments can be
Expand All @@ -56,33 +72,40 @@ host system, but will find and use TBB or OpenMP for the device system.

To allow a Thrust target to be configurable easily via `cmake-gui` or
`ccmake`, pass the `FROM_OPTIONS` flag to `thrust_create_target`. This will add
`THRUST_HOST_SYSTEM` and `THRUST_DEVICE_SYSTEM` options to the CMake cache that
allow selection from the systems supported by this version of Thrust.
`THRUST_HOST_SYSTEM`, `THRUST_DEVICE_SYSTEM`, and `THRUST_DISPATCH_TYPE` options
to the CMake cache that allow selection from the systems supported by this version
of Thrust.

```cmake
thrust_create_target(Thrust FROM_OPTIONS
[HOST_OPTION <option name>]
[DEVICE_OPTION <option name>]
[DISPATCH_OPTION <option name>]
[HOST_OPTION_DOC <doc string>]
[DEVICE_OPTION_DOC <doc string>]
[DISPATCH_OPTION_DOC <doc string>]
[HOST <default host system name>]
[DEVICE <default device system name>]
[DISPATCH <default dispatch type>]
[ADVANCED]
)
```

The optional arguments have sensible defaults, but may be configured per
`thrust_create_target` call:

| Argument | Default | Description |
|---------------------|-------------------------|---------------------------------|
| `HOST_OPTION` | `THRUST_HOST_SYSTEM` | Name of cache option for host |
| `DEVICE_OPTION` | `THRUST_DEVICE_SYSTEM` | Name of cache option for device |
| `HOST_OPTION_DOC` | Thrust's host system. | Docstring for host option |
| `DEVICE_OPTION_DOC` | Thrust's device system. | Docstring for device option |
| `HOST` | `CPP` | Default host system |
| `DEVICE` | `CUDA` | Default device system |
| `ADVANCED` | *N/A* | Mark cache options advanced |
| Argument | Default | Description |
|-----------------------|-------------------------|-----------------------------------|
| `HOST_OPTION` | `THRUST_HOST_SYSTEM` | Name of cache option for host |
| `DEVICE_OPTION` | `THRUST_DEVICE_SYSTEM` | Name of cache option for device |
| `DISPATCH_OPTION` | `THRUST_DISPATCH_TYPE` | Name of cache option for dispatch |
| `HOST_OPTION_DOC` | Thrust's host system. | Docstring for host option |
| `DEVICE_OPTION_DOC` | Thrust's device system. | Docstring for device option |
| `DISPATCH_OPTION_DOC` | Thrust's dispatch type. | Docstring for dispatch option |
| `HOST` | `CPP` | Default host system |
| `DEVICE` | `CUDA` | Default device system |
| `DISPATCH` | `Dispatch` | Default dispatch type |
| `ADVANCED` | *N/A* | Mark cache options advanced |

### Specifying Thrust Version Requirements

Expand Down
50 changes: 42 additions & 8 deletions lib/cmake/thrust/thrust-config.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@
#
# Provided by NVIDIA under the same license as the associated Thrust library.
#
# Reply-To: Allison Vacanti <[email protected]>
#
# *****************************************************************************
# ** The following is a short reference to using Thrust from CMake. **
# ** For more details, see the README.md in the same directory as this file. **
Expand All @@ -30,10 +28,13 @@
# thrust_create_target(TargetName FROM_OPTIONS
# [HOST_OPTION <option_name>] # Optionally rename the host system option
# [DEVICE_OPTION <option_name>] # Optionally rename the device system option
# [DISPATCH_OPTION <option_name>] # Optionally rename the dispatch system option
# [HOST_OPTION_DOC <doc_string>] # Optionally change the cache label
# [DEVICE_OPTION_DOC <doc_string>] # Optionally change the cache label
# [DISPATCH_OPTION_DOC <doc_str>] # Optionally change the cache label
# [HOST <default system>] # Optionally change the default backend
# [DEVICE <default system>] # Optionally change the default backend
# [DISPATCH <default dispatch>] # Optionally change the default dispatch
# [ADVANCED] # Optionally mark options as advanced
# [GLOBAL] # Optionally mark the target as GLOBAL
# )
Expand All @@ -59,6 +60,11 @@
# IGNORE_CUB_VERSION # Skip configure-time and compile-time CUB version checks
# )
#
# # DISPATCH options (See README):
# thrust_create_target(TargetName DISPATCH Dynamic)
# thrust_create_target(TargetName DISPATCH Force32bit)
# thrust_create_target(TargetName DISPATCH Force64bit)
#
# # Test if a particular system has been loaded. ${var_name} is set to TRUE or
# # FALSE to indicate if "system" is found.
# thrust_is_system_found(<system> <var_name>)
Expand Down Expand Up @@ -100,6 +106,11 @@ set(THRUST_DEVICE_SYSTEM_OPTIONS
CACHE INTERNAL "Valid Thrust device systems"
FORCE
)
set(THRUST_DISPATCH_TYPE_OPTIONS
Dynamic Force32bit Force64bit
CACHE INTERNAL "Valid Thrust dispatch types"
FORCE
)

# Workaround cmake issue #20670 https://gitlab.kitware.com/cmake/cmake/-/issues/20670
# Legacy all-caps THRUST variables:
Expand Down Expand Up @@ -137,6 +148,9 @@ function(thrust_create_target target_name)
HOST
HOST_OPTION
HOST_OPTION_DOC
DISPATCH
DISPATCH_OPTION
DISPATCH_OPTION_DOC
)
cmake_parse_arguments(TCT "${options}" "${keys}" "" ${ARGN})
if (TCT_UNPARSED_ARGUMENTS)
Expand All @@ -158,10 +172,13 @@ function(thrust_create_target target_name)

_thrust_set_if_undefined(TCT_HOST CPP)
_thrust_set_if_undefined(TCT_DEVICE CUDA)
_thrust_set_if_undefined(TCT_DISPATCH Dynamic)
_thrust_set_if_undefined(TCT_HOST_OPTION THRUST_HOST_SYSTEM)
_thrust_set_if_undefined(TCT_DEVICE_OPTION THRUST_DEVICE_SYSTEM)
_thrust_set_if_undefined(TCT_HOST_OPTION_DOC "Thrust host system.")
_thrust_set_if_undefined(TCT_DEVICE_OPTION_DOC "Thrust device system.")
_thrust_set_if_undefined(TCT_DISPATCH_OPTION THRUST_DISPATCH_TYPE)
_thrust_set_if_undefined(TCT_HOST_OPTION_DOC "Thrust host system: ${THRUST_HOST_SYSTEM_OPTIONS}")
_thrust_set_if_undefined(TCT_DEVICE_OPTION_DOC "Thrust device system: ${THRUST_DEVICE_SYSTEM_OPTIONS}")
_thrust_set_if_undefined(TCT_DISPATCH_OPTION_DOC "Thrust dispatch type: ${THRUST_DISPATCH_TYPE_OPTIONS}")

if (NOT TCT_HOST IN_LIST THRUST_HOST_SYSTEM_OPTIONS)
message(FATAL_ERROR
Expand All @@ -175,18 +192,26 @@ function(thrust_create_target target_name)
)
endif()

if (NOT TCT_DISPATCH IN_LIST THRUST_DISPATCH_TYPE_OPTIONS)
message(FATAL_ERROR
"Requested DISPATCH=${TCT_DISPATCH}; must be one of ${THRUST_DISPATCH_TYPE_OPTIONS}"
)
endif()

if (TCT_FROM_OPTIONS)
_thrust_create_cache_options(
${TCT_HOST} ${TCT_DEVICE}
${TCT_HOST_OPTION} ${TCT_DEVICE_OPTION}
${TCT_HOST_OPTION_DOC} ${TCT_DEVICE_OPTION_DOC}
${TCT_HOST} ${TCT_DEVICE} ${TCT_DISPATCH}
${TCT_HOST_OPTION} ${TCT_DEVICE_OPTION} ${TCT_DISPATCH_OPTION}
${TCT_HOST_OPTION_DOC} ${TCT_DEVICE_OPTION_DOC} ${TCT_DISPATCH_OPTION_DOC}
${TCT_ADVANCED}
)
set(TCT_HOST ${${TCT_HOST_OPTION}})
set(TCT_DEVICE ${${TCT_DEVICE_OPTION}})
set(TCT_DISPATCH ${${TCT_DISPATCH_OPTION}})
thrust_debug("Current option settings:" internal)
thrust_debug(" - ${TCT_HOST_OPTION}=${TCT_HOST}" internal)
thrust_debug(" - ${TCT_DEVICE_OPTION}=${TCT_DEVICE}" internal)
thrust_debug(" - ${TCT_DISPATCH_OPTION}=${TCT_DISPATCH}" internal)
endif()

_thrust_find_backend(${TCT_HOST} REQUIRED)
Expand All @@ -206,6 +231,12 @@ function(thrust_create_target target_name)
Thrust::${TCT_DEVICE}::Device
)

if (${TCT_DISPATCH} STREQUAL "Force32bit")
target_compile_definitions(${target_name} INTERFACE "THRUST_FORCE_32_BIT_OFFSET_TYPE")
elseif(${TCT_DISPATCH} STREQUAL "Force64bit")
target_compile_definitions(${target_name} INTERFACE "THRUST_FORCE_64_BIT_OFFSET_TYPE")
endif()

# This would be nice to enforce, but breaks when using old cmake + new
# compiler, since cmake doesn't know what features the new compiler version
# supports.
Expand Down Expand Up @@ -416,14 +447,17 @@ function(_thrust_declare_interface_alias alias_name ugly_name)
endfunction()

# Create cache options for selecting the user/device systems with ccmake/cmake-gui.
function(_thrust_create_cache_options host device host_option device_option host_doc device_doc advanced)
function(_thrust_create_cache_options host device dispatch host_option device_option dispatch_option host_doc device_doc dispatch_doc advanced)
thrust_debug("Creating system cache options: (advanced=${advanced})" internal)
thrust_debug(" - Host Option=${host_option} Default=${host} Doc='${host_doc}'" internal)
thrust_debug(" - Device Option=${device_option} Default=${device} Doc='${device_doc}'" internal)
thrust_debug(" - Dispatch Option=${dispatch_option} Default=${dispatch} Doc='${dispatch_doc}'" internal)
set(${host_option} ${host} CACHE STRING "${host_doc}")
set_property(CACHE ${host_option} PROPERTY STRINGS ${THRUST_HOST_SYSTEM_OPTIONS})
set(${device_option} ${device} CACHE STRING "${device_doc}")
set_property(CACHE ${device_option} PROPERTY STRINGS ${THRUST_DEVICE_SYSTEM_OPTIONS})
set(${dispatch_option} ${dispatch} CACHE STRING "${dispatch_doc}")
set_property(CACHE ${dispatch_option} PROPERTY STRINGS ${THRUST_DISPATCH_TYPE_OPTIONS})
if (advanced)
mark_as_advanced(${host_option} ${device_option})
endif()
Expand Down
2 changes: 1 addition & 1 deletion thrust/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ option(THRUST_ENABLE_TESTING "Build Thrust testing suite." "ON")
option(THRUST_ENABLE_EXAMPLES "Build Thrust examples." "ON")

# Allow the user to optionally select offset type dispatch to fixed 32 or 64 bit types
set(THRUST_DISPATCH_TYPE "Dynamic" CACHE STRING "Select Thrust offset type dispatch." FORCE)
set(THRUST_DISPATCH_TYPE "Dynamic" CACHE STRING "Select Thrust offset type dispatch.")
set_property(CACHE THRUST_DISPATCH_TYPE PROPERTY STRINGS "Dynamic" "Force32bit" "Force64bit")

# Check if we're actually building anything before continuing. If not, no need
Expand Down
6 changes: 0 additions & 6 deletions thrust/cmake/ThrustBuildCompilerTargets.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,6 @@ function(thrust_build_compiler_targets)
append_option_if_available("/wd4146" cxx_compile_options)
endif()

if (THRUST_DISPATCH_TYPE STREQUAL "Force32bit")
list(APPEND cxx_compile_definitions "THRUST_FORCE_32_BIT_OFFSET_TYPE")
elseif (THRUST_DISPATCH_TYPE STREQUAL "Force64bit")
list(APPEND cxx_compile_definitions "THRUST_FORCE_64_BIT_OFFSET_TYPE")
endif()

cccl_build_compiler_interface(thrust.compiler_interface
"${cuda_compile_options}"
"${cxx_compile_options}"
Expand Down
1 change: 1 addition & 0 deletions thrust/cmake/ThrustBuildTargetList.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,7 @@ function(_thrust_build_target_list_multiconfig)
thrust_create_target(${target_name}
HOST ${host}
DEVICE ${device}
DISPATCH ${THRUST_DISPATCH_TYPE}
${THRUST_TARGET_FLAGS}
)

Expand Down
7 changes: 0 additions & 7 deletions thrust/cmake/ThrustHeaderTesting.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -154,13 +154,6 @@ foreach(thrust_target IN LISTS THRUST_TARGETS)
"CUB_WRAPPED_NAMESPACE=wrapped_cub")
thrust_add_header_test(${thrust_target} wrap "${header_definitions}")

# We need to ensure that the different dispatch mechanisms work
set(header_definitions "THRUST_FORCE_32_BIT_OFFSET_TYPE")
thrust_add_header_test(${thrust_target} offset_32 "${header_definitions}")

set(header_definitions "THRUST_FORCE_64_BIT_OFFSET_TYPE")
thrust_add_header_test(${thrust_target} offset_64 "${header_definitions}")

thrust_get_target_property(config_device ${thrust_target} DEVICE)
if ("CUDA" STREQUAL "${config_device}")
# Check that BF16 support can be disabled
Expand Down
4 changes: 4 additions & 0 deletions thrust/testing/copy.cu
Original file line number Diff line number Diff line change
Expand Up @@ -668,6 +668,8 @@ void TestCopyIfStencilDispatchImplicit()
}
DECLARE_UNITTEST(TestCopyIfStencilDispatchImplicit);

#ifndef THRUST_FORCE_32_BIT_OFFSET_TYPE

struct only_set_when_expected_it
{
long long expected;
Expand Down Expand Up @@ -752,3 +754,5 @@ void TestCopyWithBigIndexes()
TestCopyWithBigIndexesHelper(33);
}
DECLARE_UNITTEST(TestCopyWithBigIndexes);

#endif
2 changes: 2 additions & 0 deletions thrust/testing/count.cu
Original file line number Diff line number Diff line change
Expand Up @@ -119,8 +119,10 @@ void TestCountWithBigIndexesHelper(int magnitude)
void TestCountWithBigIndexes()
{
TestCountWithBigIndexesHelper(30);
#ifndef THRUST_FORCE_32_BIT_OFFSET_TYPE
TestCountWithBigIndexesHelper(31);
TestCountWithBigIndexesHelper(32);
TestCountWithBigIndexesHelper(33);
#endif
}
DECLARE_UNITTEST(TestCountWithBigIndexes);
2 changes: 2 additions & 0 deletions thrust/testing/cuda/adjacent_difference.cu
Original file line number Diff line number Diff line change
Expand Up @@ -153,8 +153,10 @@ void TestAdjacentDifferenceWithBigIndexesHelper(int magnitude)
void TestAdjacentDifferenceWithBigIndexes()
{
TestAdjacentDifferenceWithBigIndexesHelper(30);
#ifndef THRUST_FORCE_32_BIT_OFFSET_TYPE
TestAdjacentDifferenceWithBigIndexesHelper(31);
TestAdjacentDifferenceWithBigIndexesHelper(32);
TestAdjacentDifferenceWithBigIndexesHelper(33);
#endif
}
DECLARE_UNITTEST(TestAdjacentDifferenceWithBigIndexes);
3 changes: 3 additions & 0 deletions thrust/testing/cuda/partition.cu
Original file line number Diff line number Diff line change
Expand Up @@ -677,9 +677,12 @@ void TestPartitionIfWithMagnitude(int magnitude)
void TestPartitionIfWithLargeNumberOfItems()
{
TestPartitionIfWithMagnitude(30);
// These require 64-bit dispatches even when magnitude < 32.
# ifndef THRUST_FORCE_32_BIT_OFFSET_TYPE
TestPartitionIfWithMagnitude(31);
TestPartitionIfWithMagnitude(32);
TestPartitionIfWithMagnitude(33);
# endif
}
DECLARE_UNITTEST(TestPartitionIfWithLargeNumberOfItems);
#endif
Expand Down
2 changes: 2 additions & 0 deletions thrust/testing/cuda/reduce_by_key.cu
Original file line number Diff line number Diff line change
Expand Up @@ -433,8 +433,10 @@ void TestReduceByKeyWithBigIndexesHelper(int magnitude)
void TestReduceByKeyWithBigIndexes()
{
TestReduceByKeyWithBigIndexesHelper(30);
#ifndef THRUST_FORCE_32_BIT_OFFSET_TYPE
TestReduceByKeyWithBigIndexesHelper(31);
TestReduceByKeyWithBigIndexesHelper(32);
TestReduceByKeyWithBigIndexesHelper(33);
#endif
}
DECLARE_UNITTEST(TestReduceByKeyWithBigIndexes);
7 changes: 6 additions & 1 deletion thrust/testing/cuda/sort.cu
Original file line number Diff line number Diff line change
Expand Up @@ -251,9 +251,14 @@ void TestSortWithMagnitude(int magnitude)

void TestSortWithLargeNumberOfItems()
{
TestSortWithMagnitude(39);
TestSortWithMagnitude(30);
// These still require 64-bit dispatches when magnitude < 32.
#ifndef THRUST_FORCE_32_BIT_OFFSET_TYPE
TestSortWithMagnitude(31);
TestSortWithMagnitude(32);
TestSortWithMagnitude(33);
TestSortWithMagnitude(39);
#endif
}
DECLARE_UNITTEST(TestSortWithLargeNumberOfItems);

Expand Down
2 changes: 2 additions & 0 deletions thrust/testing/inner_product.cu
Original file line number Diff line number Diff line change
Expand Up @@ -131,9 +131,11 @@ void TestInnerProductWithBigIndexesHelper(int magnitude)
void TestInnerProductWithBigIndexes()
{
TestInnerProductWithBigIndexesHelper(30);
#ifndef THRUST_FORCE_32_BIT_OFFSET_TYPE
TestInnerProductWithBigIndexesHelper(31);
TestInnerProductWithBigIndexesHelper(32);
TestInnerProductWithBigIndexesHelper(33);
#endif
}
DECLARE_UNITTEST(TestInnerProductWithBigIndexes);

Expand Down
2 changes: 2 additions & 0 deletions thrust/testing/max_element.cu
Original file line number Diff line number Diff line change
Expand Up @@ -104,8 +104,10 @@ void TestMaxElementWithBigIndexesHelper(int magnitude)
void TestMaxElementWithBigIndexes()
{
TestMaxElementWithBigIndexesHelper(30);
#ifndef THRUST_FORCE_32_BIT_OFFSET_TYPE
TestMaxElementWithBigIndexesHelper(31);
TestMaxElementWithBigIndexesHelper(32);
TestMaxElementWithBigIndexesHelper(33);
#endif
}
DECLARE_UNITTEST(TestMaxElementWithBigIndexes);
2 changes: 2 additions & 0 deletions thrust/testing/min_element.cu
Original file line number Diff line number Diff line change
Expand Up @@ -102,8 +102,10 @@ void TestMinElementWithBigIndexesHelper(int magnitude)
void TestMinElementWithBigIndexes()
{
TestMinElementWithBigIndexesHelper(30);
#ifndef THRUST_FORCE_32_BIT_OFFSET_TYPE
TestMinElementWithBigIndexesHelper(31);
TestMinElementWithBigIndexesHelper(32);
TestMinElementWithBigIndexesHelper(33);
#endif
}
DECLARE_UNITTEST(TestMinElementWithBigIndexes);
Loading

0 comments on commit 4b2da5f

Please sign in to comment.