-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EPIC] RAPIDS Should not need to patch CCCL #1939
Comments
The patches we apply to all RAPIDS libraries: The patches we apply exclusively to cuDF: |
I made some progress here. The patch reverting PR 211 is actually masking a bug in cuSpatial, which I fixed. This patch can be removed from rapids-cmake and cuDF. See these PRs I just opened. They should merge in this order:
Next steps would be to investigate the patches for only cuDF, specifically macros to reduce compile time (64 bit dispatch and fewer CUB arch policies). |
I just filed: #1958 For fewer CUB arch policies, I'm still a bit confused tbh. My understanding of ChainedPolicy in CUB is that this patch should have no impact on compile time because CUB only instantiates kernels for the architectures you build for. I need @gevtushenko to look at that patch because he knows |
While upgrading CCCL, we ran into a test failure in cuSpatial. We added a patch to revert some changes from CCCL but the root cause was a bug in cuSpatial. I have fixed that bug here: rapidsai/cuspatial#1402 Once that PR is merged, we can remove this CCCL patch. See also: - rapids-cmake patch removal: rapidsai/rapids-cmake#640 - Original rapids-cmake patch: rapidsai/rapids-cmake#511 - CCCL epic to remove RAPIDS patches: NVIDIA/cccl#1939 Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Robert Maynard (https://github.com/robertmaynard) URL: #16207
While upgrading CCCL, we ran into a test failure in cuSpatial. We added a patch to revert some changes from CCCL but the root cause was a bug in cuSpatial. I have fixed that bug here: rapidsai/cuspatial#1402 Once that PR is merged, we can remove this CCCL patch. See also: - #511 - NVIDIA/cccl#1939 Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Robert Maynard (https://github.com/robertmaynard) URL: #640
@jrhemstad With patch (chained policy starts at 600): cat _deps/cccl-src/cub/cub/device/dispatch/tuning/tuning_scan.cuh | grep "ChainedPolicy<600"
, ChainedPolicy<600, Policy600, Policy600>
for i in {1..10}; do echo "// comment " >> ../src/reductions/scan/scan_exclusive.cu; ninja > /dev/null; python ../scripts/sort_ninja_log.py .ninja_log | grep exclusive | awk -F',' '{print $1}' >> patched.csv; done; awk '{sum += $1} END {print sum/NR}' patched.csv
3407.3
cuobjdump --dump-sass CMakeFiles/cudf.dir/src/reductions/scan/scan_exclusive.cu.o | grep DeviceScanKernel | wc -l
480 Without patch (chained policy starts at 520): cat _deps/cccl-src/cub/cub/device/dispatch/tuning/tuning_scan.cuh | grep "ChainedPolicy<600"
, ChainedPolicy<600, Policy600, Policy520>
for i in {1..10}; do echo "// comment " >> ../src/reductions/scan/scan_exclusive.cu; ninja > /dev/null; python ../scripts/sort_ninja_log.py .ninja_log | grep exclusive | awk -F',' '{print $1}' >> unpatched.csv; done; awk '{sum += $1} END {print sum/NR}' unpatched.csv
3403.9
cuobjdump --dump-sass CMakeFiles/cudf.dir/src/reductions/scan/scan_exclusive.cu.o | grep DeviceScanKernel | wc -l
480 As you can see, the build time difference is within noise level. The number of kernels doesn't change as well. I believe the situation shouldn't be any different for reduction. But to be on the safe side, I'd recommend repeating the experiment for radix sort. |
@robertmaynard or @davidwendt would one of you mind checking to see if removing this patch from cudf does not impact compile time? |
Reference this patch file: https://github.com/rapidsai/cudf/blob/branch-24.08/cpp/cmake/thirdparty/patches/thrust_faster_scan_compile_times.diff |
Because CCCL is currently building RAPIDS from source as part of CCCL's CI. That CI job is currently broken because internal changes within Thrust are now breaking RAPIDS' patches. We want to reduce the surface area of RAPIDS' patches as quickly as possible. |
rapids-cmake patches failing to apply aren't a CMake error. This is infact currently leveraged in libcudf 24.06 where we apply two sets of patches to support different CCCL versions. If I look at a recent CCCL PR ( #2006 ) I see the RAPIDS cudf builds occurring without issue. If we look at the logs we see:
While we failed to apply the |
RAPIDS currently has to apply a variety of patches to CCCL source code to work around various issues.
This issue is to track eliminating the need for RAPIDS to apply any patches to CCCL source code.
To start, for each patch we need a separate issue created with the following information:
All issues should be added to this task list:
Tasks
The text was updated successfully, but these errors were encountered: