Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Remove make_strings_children_with_null_mask utility #8580

Closed
davidwendt opened this issue Jun 22, 2021 · 0 comments · Fixed by #8830
Closed

[FEA] Remove make_strings_children_with_null_mask utility #8580

davidwendt opened this issue Jun 22, 2021 · 0 comments · Fixed by #8830
Assignees
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python)

Comments

@davidwendt
Copy link
Contributor

The cudf::strings::detail::make_strings_children_with_null_mask requires an extra byte array the size of the output column which it uses to convert to a bitmask for the output column. It is used in only join_list_elements and interleave_list_entries right now.

The extra byte array is not required since the bitmask can be created directly using the cudf::detail::valid_if utility instead. Or a custom kernel could be used to build the offsets and the bitmask to avoid the extra kernel call.

@davidwendt davidwendt added feature request New feature or request Needs Triage Need team to review and classify tech debt strings strings issues (C++ and Python) labels Jun 22, 2021
@davidwendt davidwendt added the libcudf Affects libcudf (C++/CUDA) code. label Jun 22, 2021
@beckernick beckernick removed the Needs Triage Need team to review and classify label Jul 12, 2021
@davidwendt davidwendt self-assigned this Jul 21, 2021
@rapids-bot rapids-bot bot closed this as completed in #8830 Aug 4, 2021
rapids-bot bot pushed a commit that referenced this issue Aug 4, 2021
Closes #8580 

The `cudf::strings::detail::make_strings_children_with_null_mask` utility was created temporarily to help build the output column validities bitmask for `join_lists_elements` (for strings) and `lists::interleave_columns` (for strings). But it used a temporary `int8_t` device vector to hold single-bit values. It would then convert the `int8` column into a bitmask with a kernel call. This PR removes the utility in favor of executing a kernel using the `cudf::detail::valid_if` utility to build the bitmask directly without requiring a temporary buffer. Removing the temporary buffer from the `join_list_elements` strings API was not difficult. The temporary buffer is still used in the `lists::interleave_columns` for now.

 A follow on PR should change this to utilize the output bitmask and directly set the bits rather than using a temporary `int8` buffer that gets converted to a bitmask. This approach could also be used in the `join_lists_element` to ultimately avoid the `valid_if` call.

Removing this utility simplifies the code a bit and should speed up compiling any source file that includes `cudf/strings/detail/utilities.cuh` (~160 files right now).

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Conor Hoekstra (https://github.com/codereport)

URL: #8830
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants