Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace direct cudaMemcpyAsync calls with utility functions (within /include) #17557

Merged
merged 9 commits into from
Dec 11, 2024

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Dec 9, 2024

Description

Replaced the calls to cudaMemcpyAsync with the new cuda_memcpy/cuda_memcpy_async utility, which optionally avoids using the copy engine.

Also took the opportunity to use cudf::detail::host_vector and its factories to enable wider pinned memory use.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link

copy-pr-bot bot commented Dec 9, 2024

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Dec 9, 2024
@vuule vuule added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed libcudf Affects libcudf (C++/CUDA) code. labels Dec 9, 2024
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Dec 9, 2024
@vuule vuule marked this pull request as ready for review December 10, 2024 17:57
@vuule vuule requested a review from a team as a code owner December 10, 2024 17:57
@vuule vuule changed the title Replace direct cudaMemcpyAsync calls with utility functions (limited to /include) Replace direct cudaMemcpyAsync calls with utility functions (within /include) Dec 10, 2024
Co-authored-by: David Wendt <[email protected]>
Comment on lines 52 to 53
return cudf::detail::make_host_vector_sync(
device_span<T const>{col_view.data<T>() + element_index, 1}, stream).front();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quibble: In some alternate universe where pinned memory isn't being used and we drop back to the pageable resource, this is probably somewhat worse than what's there now. I imagine that's not a case we really care about though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. FWIW, we did a full benchmark run with a change like this in the device_scalar, and there was no negative impact from involving the heap unnecessarily.

@vuule vuule added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Dec 10, 2024
@vuule
Copy link
Contributor Author

vuule commented Dec 11, 2024

/merge

@rapids-bot rapids-bot bot merged commit 3801e74 into rapidsai:branch-25.02 Dec 11, 2024
106 checks passed
@vuule vuule deleted the remove-memcpy-include branch December 11, 2024 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants