Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Reduce contiguous-split block_size for copy_partition (#8216)
The `cudf::contiguous_split` API fails with error `cudaErrorLaunchOutOfResources too many resources requested for launch` when launching the internel `copy_partition` kernel. https://github.com/rapidsai/cudf/blob/c2c67de0a487ef767f2b16e6b95132121e2eec04/cpp/src/copying/contiguous_split.cu#L1024-L1026 Was determined that the `block_size=512` was too large for the debug build to launch the kernel. Changing this to `256` solved the error and all split_tests are passing. Ran the gbenchmark CONTIGUOUS_SPLIT_BENCH (on a Release build) and found no measurable performance difference running on my Linux desktop with a GV100. I can post the benchmark results if desired. So this PR just changes the hardcoded block size for the `copy_partition` kernel from 512 to 256. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Mark Harris (https://github.com/harrism) - Jason Lowe (https://github.com/jlowe) - https://github.com/nvdbaranec URL: #8216
- Loading branch information