[PERF] Remove stream sync in concatenate for better pipelining #17172

tgujar · 2024-10-24T18:12:44Z

Concatenate function uses thrust::copy in its implementation but throws away the return value from thrust::copy. Since thrust::copy needs to return an iterator it leads to an unnecessary stream sync.

cudf/cpp/src/copying/concatenate.cu

Line 310 in 2de545b

thrust::copy(

The text was updated successfully, but these errors were encountered:

wence- · 2024-10-24T18:26:26Z

Note that it does not suffice to use the nosync exec policy for such thrust calls, since whenever thrust returns a host-side value, even with the nosync exec policy it must sync.

There are numerous places (I have not audited them) where we make such calls but don't actually need the sync. So we should, with benchmarking evidence, consider either implementing non-syncing implementations in libcudf, or upstreaming a CCCL request for such features so that everyone can get them.

See also the nosync exec policy tracking issue: #12086

Replacing `thrust::copy` with `cudaMemcpyAsync` improves performance upto 2x in specific cases in `cudf::concatenate` The `thrust::copy` does a sync for device-to-device copy though it is not necessary. Using `rmm::exec_policy_nosync` had no effect. Will work with CCCL to determine if this is a bug in `thrust::copy` since computing the return value does not require a sync. Also moved the benchmark for concatenate from googlebench to nvbench. Closes #17172 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Vukasin Milovanovic (https://github.com/vuule) URL: #17584

PointKernel added libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function Performance Performance related issue labels Oct 24, 2024

davidwendt mentioned this issue Dec 12, 2024

Use no-sync copy for fixed-width types in cudf::concatenate #17584

Merged

3 tasks

rapids-bot bot closed this as completed in #17584 Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PERF] Remove stream sync in concatenate for better pipelining #17172

[PERF] Remove stream sync in concatenate for better pipelining #17172

tgujar commented Oct 24, 2024

wence- commented Oct 24, 2024 •

edited

Loading

[PERF] Remove stream sync in concatenate for better pipelining #17172

[PERF] Remove stream sync in concatenate for better pipelining #17172

Comments

tgujar commented Oct 24, 2024

wence- commented Oct 24, 2024 • edited Loading

wence- commented Oct 24, 2024 •

edited

Loading