Fix cudf::test::to_host to handle both offset types for strings columns #15073

davidwendt · 2024-02-16T16:15:43Z

Description

The cudf::test::to_host function is updated to handle int32 and int64 offset types for strings columns when copying data to host memory. This function is used with cudf::test::print() as well.

Also moved the function from the header column_utilities.hpp to the column_utilities.cu file.
And moved the specialization for of to_host for fixed-point types from the header to .cu as well.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

bdice · 2024-02-23T01:36:00Z

cpp/tests/utilities/column_utilities.cu

+  CUDF_CUDA_TRY(
+    cudaMemcpy(host_rep_types.data(), c.begin<Rep>(), c.size() * sizeof(Rep), cudaMemcpyDefault));


Can we use cudf::detail::make_std_vector_sync or some vector factory like that, rather than using raw cudaMemcpy calls? We're using that function below, so it seems reasonable to do the same here.

This cudaMemcpy was not introduced here but merely moved from .hpp file.
All of the to_host overloads return thrust::host_vector objects and the vector factories return std::vector.
I think the right thing would be change all of these since there are more than this one cudMemcpy due to the to_host signature. I believe this would be out of scope for this PR and I can create follow on one to correct all the to_host functions and callers.

Great. Let's do a follow-up PR.

karthikeyann

Minor nitpick about copy constructor call.
The code change looks good to me.

karthikeyann · 2024-02-29T17:06:38Z

cpp/tests/utilities/column_utilities.cu

+  auto const host_fixed_points = thrust::host_vector<T>(begin, begin + c.size());
+
+  return {host_fixed_points, bitmask_to_host(c)};


nit: copy elision does not happen here. (for both const or non-const)
Tried a few codes in compiler explorer; https://godbolt.org/z/jMGn8qK31
Found that copy elision (NRVO) doesn't work while calling constructor of std::pair.
It's probably better to usestd::move() here.

It could be a follow up PR to optimize all specializations in this file.

davidwendt · 2024-02-29T22:15:13Z

/merge

Cleanup per comments in #15073: - Fix return to move instead of copy https://github.com/rapidsai/cudf/pull/15073/files#r1507913472 - Use vector factories instead of cudaMemcpy: https://github.com/rapidsai/cudf/pull/15073/files#r1500136815 Also removed some unneeded headers found in `gather.cuh` while working on this. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) URL: #15263

Fix cudf::test::to_host to handle offset types

eb20e37

davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 16, 2024

davidwendt self-assigned this Feb 16, 2024

fix doxygen error

e0d6f91

davidwendt changed the title ~~Fix cudf::test::to_host to handle offset types~~ Fix cudf::test::to_host to handle both offset types for strings columns Feb 16, 2024

davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Feb 16, 2024

davidwendt marked this pull request as ready for review February 20, 2024 14:31

davidwendt requested a review from a team as a code owner February 20, 2024 14:31

davidwendt requested review from bdice and divyegala February 20, 2024 14:31

bdice reviewed Feb 23, 2024

View reviewed changes

davidwendt requested a review from bdice February 27, 2024 16:16

bdice approved these changes Feb 27, 2024

View reviewed changes

karthikeyann approved these changes Feb 29, 2024

View reviewed changes

rapids-bot bot merged commit c1e26a6 into rapidsai:branch-24.04 Feb 29, 2024
68 checks passed

davidwendt deleted the to-host-offsets branch February 29, 2024 22:15

davidwendt mentioned this pull request Mar 8, 2024

Fix cudf::test::to_host return of host_vector #15263

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cudf::test::to_host to handle both offset types for strings columns #15073

Fix cudf::test::to_host to handle both offset types for strings columns #15073

davidwendt commented Feb 16, 2024

bdice Feb 23, 2024

davidwendt Feb 25, 2024

bdice Feb 27, 2024

karthikeyann left a comment •

edited

Loading

karthikeyann Feb 29, 2024

davidwendt commented Feb 29, 2024

		CUDF_CUDA_TRY(
		cudaMemcpy(host_rep_types.data(), c.begin<Rep>(), c.size() * sizeof(Rep), cudaMemcpyDefault));

		auto const host_fixed_points = thrust::host_vector<T>(begin, begin + c.size());

		return {host_fixed_points, bitmask_to_host(c)};

Fix cudf::test::to_host to handle both offset types for strings columns #15073

Fix cudf::test::to_host to handle both offset types for strings columns #15073

Conversation

davidwendt commented Feb 16, 2024

Description

Checklist

bdice Feb 23, 2024

Choose a reason for hiding this comment

davidwendt Feb 25, 2024

Choose a reason for hiding this comment

bdice Feb 27, 2024

Choose a reason for hiding this comment

karthikeyann left a comment • edited Loading

Choose a reason for hiding this comment

karthikeyann Feb 29, 2024

Choose a reason for hiding this comment

davidwendt commented Feb 29, 2024

karthikeyann left a comment •

edited

Loading