Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cudf::test::to_host to handle both offset types for strings columns #15073

Merged
merged 2 commits into from
Feb 29, 2024

Conversation

davidwendt
Copy link
Contributor

Description

The cudf::test::to_host function is updated to handle int32 and int64 offset types for strings columns when copying data to host memory. This function is used with cudf::test::print() as well.

Also moved the function from the header column_utilities.hpp to the column_utilities.cu file.
And moved the specialization for of to_host for fixed-point types from the header to .cu as well.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 16, 2024
@davidwendt davidwendt self-assigned this Feb 16, 2024
@davidwendt davidwendt changed the title Fix cudf::test::to_host to handle offset types Fix cudf::test::to_host to handle both offset types for strings columns Feb 16, 2024
@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Feb 16, 2024
@davidwendt davidwendt marked this pull request as ready for review February 20, 2024 14:31
@davidwendt davidwendt requested a review from a team as a code owner February 20, 2024 14:31
Comment on lines +952 to +953
CUDF_CUDA_TRY(
cudaMemcpy(host_rep_types.data(), c.begin<Rep>(), c.size() * sizeof(Rep), cudaMemcpyDefault));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use cudf::detail::make_std_vector_sync or some vector factory like that, rather than using raw cudaMemcpy calls? We're using that function below, so it seems reasonable to do the same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cudaMemcpy was not introduced here but merely moved from .hpp file.
All of the to_host overloads return thrust::host_vector objects and the vector factories return std::vector.
I think the right thing would be change all of these since there are more than this one cudMemcpy due to the to_host signature. I believe this would be out of scope for this PR and I can create follow on one to correct all the to_host functions and callers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Let's do a follow-up PR.

@davidwendt davidwendt requested a review from bdice February 27, 2024 16:16
Copy link
Contributor

@karthikeyann karthikeyann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick about copy constructor call.
The code change looks good to me.

Comment on lines +957 to +959
auto const host_fixed_points = thrust::host_vector<T>(begin, begin + c.size());

return {host_fixed_points, bitmask_to_host(c)};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: copy elision does not happen here. (for both const or non-const)
Tried a few codes in compiler explorer; https://godbolt.org/z/jMGn8qK31
Found that copy elision (NRVO) doesn't work while calling constructor of std::pair.
It's probably better to usestd::move() here.

It could be a follow up PR to optimize all specializations in this file.

@davidwendt
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit c1e26a6 into rapidsai:branch-24.04 Feb 29, 2024
68 checks passed
@davidwendt davidwendt deleted the to-host-offsets branch February 29, 2024 22:15
rapids-bot bot pushed a commit that referenced this pull request Mar 12, 2024
Cleanup per comments in #15073:
- Fix return to move instead of copy https://github.com/rapidsai/cudf/pull/15073/files#r1507913472
- Use vector factories instead of cudaMemcpy: https://github.com/rapidsai/cudf/pull/15073/files#r1500136815

Also removed some unneeded headers found in `gather.cuh` while working on this.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Nghia Truong (https://github.com/ttnghia)

URL: #15263
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants