Replace direct `cudaMemcpyAsync` calls with utility functions (within `/include`) #17557

vuule · 2024-12-09T23:33:36Z

Description

Replaced the calls to cudaMemcpyAsync with the new cuda_memcpy/cuda_memcpy_async utility, which optionally avoids using the copy engine.

Also took the opportunity to use cudf::detail::host_vector and its factories to enable wider pinned memory use.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2024-12-09T23:33:40Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

cpp/include/cudf/detail/get_value.cuh

…remove-memcpy-include

…into remove-memcpy-include

cpp/include/cudf/detail/get_value.cuh

Co-authored-by: David Wendt <[email protected]>

nvdbaranec · 2024-12-10T22:51:46Z

cpp/include/cudf/detail/get_value.cuh

+  return cudf::detail::make_host_vector_sync(
+    device_span<T const>{col_view.data<T>() + element_index, 1}, stream).front();


Quibble: In some alternate universe where pinned memory isn't being used and we drop back to the pageable resource, this is probably somewhat worse than what's there now. I imagine that's not a case we really care about though.

That's true. FWIW, we did a full benchmark run with a change like this in the device_scalar, and there was no negative impact from involving the heap unnecessarily.

vuule · 2024-12-11T20:18:44Z

/merge

all

044050a

github-actions bot assigned vuule Dec 9, 2024

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Dec 9, 2024

vuule added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed libcudf Affects libcudf (C++/CUDA) code. labels Dec 9, 2024

Merge branch 'branch-25.02' into remove-memcpy-include

ba847bf

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Dec 9, 2024

vuule added 3 commits December 9, 2024 16:12

style

5361b48

Merge branch 'branch-25.02' into remove-memcpy-include

dee2094

Merge branch 'branch-25.02' into remove-memcpy-include

273bfd7

davidwendt reviewed Dec 10, 2024

View reviewed changes

cpp/include/cudf/detail/get_value.cuh Show resolved Hide resolved

vuule added 2 commits December 10, 2024 09:54

Merge branch 'branch-25.02' of https://github.com/rapidsai/cudf into …

5786901

…remove-memcpy-include

Merge branch 'remove-memcpy-include' of https://github.com/vuule/cudf …

814b30f

…into remove-memcpy-include

vuule marked this pull request as ready for review December 10, 2024 17:57

vuule requested a review from a team as a code owner December 10, 2024 17:57

vuule requested review from mythrocks and nvdbaranec December 10, 2024 17:57

vuule changed the title ~~Replace direct cudaMemcpyAsync calls with utility functions (limited to /include)~~ Replace direct cudaMemcpyAsync calls with utility functions (within /include) Dec 10, 2024

davidwendt reviewed Dec 10, 2024

View reviewed changes

cpp/include/cudf/detail/get_value.cuh Outdated Show resolved Hide resolved

front()

ddd632b

Co-authored-by: David Wendt <[email protected]>

davidwendt approved these changes Dec 10, 2024

View reviewed changes

style

53faac6

nvdbaranec approved these changes Dec 10, 2024

View reviewed changes

vuule added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Dec 10, 2024

karthikeyann approved these changes Dec 11, 2024

View reviewed changes

rapids-bot bot merged commit 3801e74 into rapidsai:branch-25.02 Dec 11, 2024
106 checks passed

vuule deleted the remove-memcpy-include branch December 11, 2024 20:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace direct `cudaMemcpyAsync` calls with utility functions (within `/include`) #17557

Replace direct `cudaMemcpyAsync` calls with utility functions (within `/include`) #17557

vuule commented Dec 9, 2024 •

edited

Loading

copy-pr-bot bot commented Dec 9, 2024

nvdbaranec Dec 10, 2024

vuule Dec 10, 2024

vuule commented Dec 11, 2024

		return cudf::detail::make_host_vector_sync(
		device_span<T const>{col_view.data<T>() + element_index, 1}, stream).front();

Replace direct cudaMemcpyAsync calls with utility functions (within /include) #17557

Replace direct cudaMemcpyAsync calls with utility functions (within /include) #17557

Conversation

vuule commented Dec 9, 2024 • edited Loading

Description

Checklist

copy-pr-bot bot commented Dec 9, 2024

nvdbaranec Dec 10, 2024

Choose a reason for hiding this comment

vuule Dec 10, 2024

Choose a reason for hiding this comment

vuule commented Dec 11, 2024

Replace direct `cudaMemcpyAsync` calls with utility functions (within `/include`) #17557

Replace direct `cudaMemcpyAsync` calls with utility functions (within `/include`) #17557

vuule commented Dec 9, 2024 •

edited

Loading