Skip to content

Commit

Permalink
Reduce contiguous-split block_size for copy_partition (#8216)
Browse files Browse the repository at this point in the history
The `cudf::contiguous_split` API fails with error `cudaErrorLaunchOutOfResources too many resources requested for launch` when launching the internel `copy_partition` kernel. 
https://github.com/rapidsai/cudf/blob/c2c67de0a487ef767f2b16e6b95132121e2eec04/cpp/src/copying/contiguous_split.cu#L1024-L1026

Was determined that the `block_size=512` was too large for the debug build to launch the kernel. Changing this to `256` solved the error and all split_tests are passing. Ran the gbenchmark CONTIGUOUS_SPLIT_BENCH (on a Release build) and found no measurable performance difference running on my Linux desktop with a GV100. I can post the benchmark results if desired.

So this PR just changes the hardcoded block size for the `copy_partition` kernel from 512 to 256.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - Jason Lowe (https://github.com/jlowe)
  - https://github.com/nvdbaranec

URL: #8216
  • Loading branch information
davidwendt authored May 13, 2021
1 parent 082596f commit fb7cdcd
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion cpp/src/copying/contiguous_split.cu
Original file line number Diff line number Diff line change
Expand Up @@ -1021,7 +1021,7 @@ std::vector<packed_table> contiguous_split(cudf::table_view const& input,

// copy. 1 block per buffer
{
constexpr size_type block_size = 512;
constexpr size_type block_size = 256;
copy_partition<block_size><<<num_bufs, block_size, 0, stream.value()>>>(
num_src_bufs, num_partitions, d_src_bufs, d_dst_bufs, d_dst_buf_info);
}
Expand Down

0 comments on commit fb7cdcd

Please sign in to comment.