Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cudf::structs::detail::superimpose_parent_nulls does not purge nulls for the children #12027

Closed
ttnghia opened this issue Oct 28, 2022 · 0 comments · Fixed by #12239
Closed
Assignees
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code.

Comments

@ttnghia
Copy link
Contributor

ttnghia commented Oct 28, 2022

As the title said, the current implementation of cudf::structs::detail::superimpose_parent_nulls only set null mask for the children columns. If a child column is a lists/structs/strings/dictionary column, it also needs to be sanitized by purge_nonempty_nulls.

Reference:

  • void superimpose_parent_nulls(bitmask_type const* parent_null_mask,
    size_type parent_null_count,
    column& child,
    rmm::cuda_stream_view stream,
    rmm::mr::device_memory_resource* mr)
  • std::tuple<cudf::column_view, std::vector<rmm::device_buffer>> superimpose_parent_nulls(
    column_view const& parent, rmm::cuda_stream_view stream, rmm::mr::device_memory_resource* mr)
@ttnghia ttnghia added bug Something isn't working Needs Triage Need team to review and classify labels Oct 28, 2022
@ttnghia ttnghia changed the title [BUG] cudf::structs::detail::superimpose_parent_nulls does not purge null lists [BUG] cudf::structs::detail::superimpose_parent_nulls does not purge nulls for the children Oct 28, 2022
@galipremsagar galipremsagar added the libcudf Affects libcudf (C++/CUDA) code. label Oct 31, 2022
@ttnghia ttnghia removed the Needs Triage Need team to review and classify label Nov 4, 2022
rapids-bot bot pushed a commit that referenced this issue Nov 30, 2022
There are several overloads of these functions which work differently. They are classified into 2 groups:
 * `superimpose_parent_nulls(null_mask, null_count, column)`: Performs superimposing nulls from somewhere else into the input column,
 * `superimpose_parent_nulls(column_view/table_view)`: Perform superimposing nulls of the input column(s) into their children columns-the input root column(s) are not affected.

That is confusing. They should have different names to reflect their purposes. This PR renames these groups into more meaningful names: `superimpose_nulls` and `push_down_nulls`. No implementation has been changed.

This also supports #12027.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - MithunR (https://github.com/mythrocks)

URL: #12230
rapids-bot bot pushed a commit that referenced this issue Dec 16, 2022
…12239)

The current implementation of `cudf::structs::detail::superimpose_nulls` and `cudf::structs::detail::push_down_nulls` does not do null sanitization. In particular, they only apply the given null mask on the input columns, or push down the parent null mask into the children columns. If there are lists/strings being superimposed by a null bit, they are not sanitized (i.e., they still remain non-empty).

This PR fixes that behavior, sanitizing all non-empty nulls for the output column(s) of these functions. Since there are some changes in the function signatures, this PR is flagged as breaking change.

No new unit tests are needed because the existing tests are already capable to discover the sanitization issue. They were just using `CUDF_TEST_EXPECT_COLUMNS_EQUIVALENT` which allows non-empty nulls to be compared as equal to empty nulls. Changing to `CUDF_TEST_EXPECT_COLUMNS_EQUAL` in these existing tests should be sufficient.

Last but not least, the breaking changes in this PR alter the output results of the `cudf::make_lists_column` function, causing breaking in some other tests such as row bit count tests.

Closes:
 * #12027

Depends on:
 * #12230

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - MithunR (https://github.com/mythrocks)

URL: #12239
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants