-
Notifications
You must be signed in to change notification settings - Fork 757
Use cub::DeviceScan to implement thrust scans on CUDA. #1304
Conversation
See also NVIDIA/cub#210 |
13ef181
to
3018f7d
Compare
95f8676
to
ee194bc
Compare
@griwes Can you check that this is properly using the index type dispatch macros? I want to double check that I'm not missing anything. |
DVS CL 29194365 |
This is causing DVS to time out. Will try again after NVIDIA/cub#213 is merged. |
@@ -26,762 +26,204 @@ | |||
******************************************************************************/ | |||
#pragma once | |||
|
|||
|
|||
#if THRUST_DEVICE_COMPILER == THRUST_DEVICE_COMPILER_NVCC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave this in for now. @jaredhoberock @griwes does anyone know the historical reason why we had to wrap the CUDA backend headers in #if THRUST_DEVICE_COMPILER == THRUST_DEVICE_COMPILER_NVCC
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
e4306a5
to
d79e54b
Compare
Rebased on master to resolve submodule conflict. DVS CL: 29371523 |
DVS is using an old version of MSVC 2019 (19.20) that is timing out when building @brycelelbach can you update that config to use a newer version of the compiler? Looks like VS 16.8 ( |
Checking whether the timeouts are still an issue with CL 29471041. |
Moving to thrust 1.12 from 1.10 increased compile time for `scan.cu` significantly. This is likely due to the improvements made to the scan algorithm to use CUB's DeviceScan: NVIDIA/thrust#1304 This PR splits up scan.cu into `scan_exclusive.cu` and `scan_inclusive.cu` to help speed up build time when running parallel compiles. This PR also includes patches to libcudf's thrust's CUB source to disable compiling tuning artifacts for architectures below sm60. The result is about 2 minute (~11%) overall speedup on a parallel build and reduces the libcudf.so by about 25MB (17%). Authors: - David Wendt (https://github.com/davidwendt) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Robert Maynard (https://github.com/robertmaynard) - Elias Stehle (https://github.com/elstehle) - https://github.com/nvdbaranec URL: #8183
Fixes: #1301
Prerequisite: NVIDIA/cub#210