-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change cudf::test::make_null_mask to also return null-count #13081
Change cudf::test::make_null_mask to also return null-count #13081
Conversation
Removes `using namespace cudf;` from gtests source code to make it easier to read -- find where utilities and function calls are implemented. Also removed a few `using namespace cudf::test;` usages which by extension includes namespace `cudf`. Found these while working on #13081 Reference #11734 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) URL: #13089
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed review. LGTM! A couple of minor nitpicks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a detail API this is used a ton in tests... In the end I suspect we're going to need to expose a number of our currently detail-only mask APIs publicly. Anyway this LGTM for now.
thrust::host_vector<std::string> host_data(c.size()); | ||
if (c.size() > c.null_count()) { | ||
auto const scv = strings_column_view(c); | ||
auto const h_chars = cudf::detail::make_std_vector_sync<char>( | ||
cudf::device_span<char const>(scv.chars().data<char>(), scv.chars().size()), | ||
cudf::get_default_stream()); | ||
auto const h_offsets = cudf::detail::make_std_vector_sync( | ||
cudf::device_span<cudf::offset_type const>( | ||
scv.offsets().data<cudf::offset_type>() + scv.offset(), scv.size() + 1), | ||
cudf::get_default_stream()); | ||
|
||
// build std::string vector from chars and offsets | ||
std::transform( | ||
std::begin(h_offsets), | ||
std::end(h_offsets) - 1, | ||
std::begin(h_offsets) + 1, | ||
host_data.begin(), | ||
[&](auto start, auto end) { return std::string(h_chars.data() + start, end - start); }); | ||
} | ||
return {std::move(host_data), bitmask_to_host(c)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The git diff here is a bit hard to read, but IIUC this whole change is just to preallocate the vector and then early return if you don't have any non-null entries to write, is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is correct.
|
||
int begin_bit = 0; | ||
int end_bit = 800; | ||
auto gold_splice_mask = cudf::test::detail::make_null_mask(validity_bit.begin() + begin_bit, | ||
validity_bit.begin() + end_bit); | ||
auto gold_splice_mask = std::get<0>(cudf::test::detail::make_null_mask( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our of curiosity, is there a reason you prefer std::get<0>
to .first
for pairs? For consistency with tuples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that this gets us the first value from the structured binding without an unused variable and without resorting to the vagueness of .first
and .second
.
/merge |
) Add `null_count` parameter to the `cudf::io::json::experimental::detail::parse_data` function which already accepts a `null_mask`. Normally, the callers already know the count. This unction can use the parameter to help build the output column. Found while working on #13081 Contributes to: #11968 Authors: - David Wendt (https://github.com/davidwendt) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Nghia Truong (https://github.com/ttnghia) URL: #13107
…idsai#13107) Add `null_count` parameter to the `cudf::io::json::experimental::detail::parse_data` function which already accepts a `null_mask`. Normally, the callers already know the count. This unction can use the parameter to help build the output column. Found while working on rapidsai#13081 Contributes to: rapidsai#11968 Authors: - David Wendt (https://github.com/davidwendt) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#13107
Description
Change the
cudf::test::make_null_mask
to return both the null-mask and the null-count. Callers can then use this null-count instead ofUNKNOWN_NULL_COUNT
. These changes include removingUNKNOWN_NULL_COUNT
usage from the libcudf C++ test source code.One side-effect found that strings column with all nulls can technically have no children but using
UNKNOWN_NULL_COUNT
allowed the check for this to be bypassed. Therefore many utilities started to fail whenUNKNOWN_NULL_COUNT
was removed. The factory was modified to remove the check which results in an offsets column and an empty chars column as children.More code will likely need to be change when the
UNKNOWN_NULL_COUNT
is no longer used as a default parameter for factories and other column functions.No behavior is changed. Since the
cudf::test::make_null_mask
is technically a public API, this PR could be marked as a breaking change as well.Contributes to: #11968
Checklist