Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Use cub::DeviceScan to implement thrust scans on CUDA. #1304

Merged
merged 1 commit into from
Jan 21, 2021

Conversation

alliepiper
Copy link
Collaborator

@alliepiper alliepiper commented Oct 7, 2020

Fixes: #1301

Prerequisite: NVIDIA/cub#210

@alliepiper alliepiper added this to the 1.11.0 milestone Oct 7, 2020
@alliepiper alliepiper self-assigned this Oct 7, 2020
@alliepiper alliepiper marked this pull request as draft October 7, 2020 15:13
@alliepiper
Copy link
Collaborator Author

alliepiper commented Oct 7, 2020

See also NVIDIA/cub#210

@alliepiper alliepiper force-pushed the bug/use_cub_scan/gh.1301 branch from 13ef181 to 3018f7d Compare October 7, 2020 18:55
@alliepiper alliepiper force-pushed the bug/use_cub_scan/gh.1301 branch 2 times, most recently from 95f8676 to ee194bc Compare October 13, 2020 21:01
@alliepiper alliepiper changed the title WIP Integrate CUB scan into thrust. Use the faster cub::DeviceScan to implement synchronous scans for CUDA. Oct 13, 2020
@alliepiper alliepiper removed their assignment Oct 13, 2020
@alliepiper
Copy link
Collaborator Author

@griwes Can you check that this is properly using the index type dispatch macros? I want to double check that I'm not missing anything.

@alliepiper alliepiper marked this pull request as ready for review October 13, 2020 21:02
@alliepiper alliepiper added the testing: gpuCI in progress Started gpuCI testing. label Oct 13, 2020
@alliepiper
Copy link
Collaborator Author

DVS CL 29194365

@alliepiper alliepiper added testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). testing: gpuCI passed Passed gpuCI testing. and removed testing: gpuCI in progress Started gpuCI testing. labels Oct 13, 2020
@alliepiper
Copy link
Collaborator Author

This is causing DVS to time out. Will try again after NVIDIA/cub#213 is merged.

@alliepiper alliepiper added blocked Cannot make progress. and removed testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). labels Oct 14, 2020
@alliepiper alliepiper linked an issue Oct 14, 2020 that may be closed by this pull request
@@ -26,762 +26,204 @@
******************************************************************************/
#pragma once


#if THRUST_DEVICE_COMPILER == THRUST_DEVICE_COMPILER_NVCC
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave this in for now. @jaredhoberock @griwes does anyone know the historical reason why we had to wrap the CUDA backend headers in #if THRUST_DEVICE_COMPILER == THRUST_DEVICE_COMPILER_NVCC?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@alliepiper alliepiper removed testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). testing: gpuCI passed Passed gpuCI testing. labels Nov 3, 2020
@alliepiper alliepiper self-assigned this Nov 9, 2020
@alliepiper alliepiper modified the milestones: 1.11.0, 1.12.0 Nov 11, 2020
@alliepiper alliepiper force-pushed the bug/use_cub_scan/gh.1301 branch from e4306a5 to d79e54b Compare November 30, 2020 19:45
@alliepiper
Copy link
Collaborator Author

Rebased on master to resolve submodule conflict.

DVS CL: 29371523

@alliepiper alliepiper added testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). testing: gpuCI passed Passed gpuCI testing. labels Nov 30, 2020
@alliepiper
Copy link
Collaborator Author

DVS is using an old version of MSVC 2019 (19.20) that is timing out when building transform_scan.cu.

@brycelelbach can you update that config to use a newer version of the compiler? Looks like VS 16.8 (cl 19.28) is available on DVS, that's what I use locally without issues.

@alliepiper alliepiper added blocked Cannot make progress. and removed testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). labels Dec 9, 2020
@alliepiper
Copy link
Collaborator Author

Checking whether the timeouts are still an issue with CL 29471041.

@alliepiper alliepiper added testing: internal ci passed Passed internal NVIDIA CI (DVS). and removed blocked Cannot make progress. labels Jan 6, 2021
@alliepiper alliepiper merged commit f96aa57 into NVIDIA:main Jan 21, 2021
@alliepiper alliepiper deleted the bug/use_cub_scan/gh.1301 branch January 21, 2021 20:46
rapids-bot bot pushed a commit to rapidsai/cudf that referenced this pull request May 14, 2021
Moving to thrust 1.12 from 1.10 increased compile time for `scan.cu` significantly. This is likely due to the improvements made to the scan algorithm to use CUB's DeviceScan: NVIDIA/thrust#1304

This PR splits up scan.cu into `scan_exclusive.cu` and `scan_inclusive.cu` to help speed up build time when running parallel compiles.
This PR also includes patches to libcudf's thrust's CUB source to disable compiling tuning artifacts for architectures below sm60.
The result is about 2 minute (~11%) overall speedup on a parallel  build and reduces the libcudf.so by about 25MB (17%).

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Robert Maynard (https://github.com/robertmaynard)
  - Elias Stehle (https://github.com/elstehle)
  - https://github.com/nvdbaranec

URL: #8183
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
testing: gpuCI passed Passed gpuCI testing. testing: internal ci passed Passed internal NVIDIA CI (DVS).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unify Thrust/CUB scan algorithms
2 participants