Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] libcudf functions for segmented_has_nulls / has_null_elements(list_view) / segmented_bitmask_reduction #9552

Closed
bdice opened this issue Oct 28, 2021 · 4 comments · Fixed by #9621
Assignees
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.

Comments

@bdice
Copy link
Contributor

bdice commented Oct 28, 2021

Is your feature request related to a problem? Please describe.
Following from libcudf discussions around null support and lists (or segmented columns more generally), we have identified a need for a few new APIs. Many reductions over a list_view produce results that should be null if any element in the list (an element in the corresponding segment of the child column) is null.

cc: @isVoid @jrhemstad @nvdbaranec, @ttnghia @mythrocks

Describe the solution you'd like

First, segmented_count_set_bits should be refactored to separate the section acting on device-side data. This function will do the same thing as the existing host-side function but act on arrays of indices that are already on the device.

We propose the addition of the following functions:

Additional context
This affects @bdice's work on #9215, @ttnghia's work on #9545, and @isVoid's plan for addressing #9135.

@bdice bdice added feature request New feature or request Needs Triage Need team to review and classify labels Oct 28, 2021
@bdice bdice self-assigned this Oct 28, 2021
@bdice bdice changed the title [FEA] libcudf functions for segmented_has_nulls / has_null_elements(list_view) / segmented_bitmask_reduction [FEA] libcudf functions for segmented_has_nulls / has_null_elements(list_view) / segmented_bitmask_reduction Oct 28, 2021
@bdice
Copy link
Contributor Author

bdice commented Oct 28, 2021

I have started on this, and have a draft of splitting out device-side code from segmented_count_set_bits. I'll open a PR soon.

@beckernick beckernick added libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Nov 12, 2021
rapids-bot bot pushed a commit that referenced this issue Dec 10, 2021
… split host/device side code for segmented counts. (#9588)

This PR is a first step towards #9552. I split the host and device code for `segmented_count_set_bits`, so that APIs with existing indices on the device can call the device function directly. Along the way, I discussed some pain points in the current design of bit counting functions with @jrhemstad and @mythrocks and made the following changes:

- Fixed inconsistencies in the behavior of bit counting functions (`count_set_bits` returned zero when `bitmask==nullptr`, but `segmented_count_set_bits` returned segment lengths) by raising an error when `bitmask==nullptr` _(breaking change)_

- Moved all bit counting functions to detail namespaces _(breaking change)_

- Separated "validity" logic from pure bit counting through the introduction of new `valid_count`, `null_count`, `segmented_valid_count`, `segmented_null_count` detail functions. These functions treat all elements as valid when `bitmask==nullptr` and otherwise return the same way as the corresponding bit counting functions.

- Split device-side code from host-side code to enable future work on segmented nullmask reductions over lists, with indices (offsets) already on the device.

- Updated all calling code (and added tests) to use the new valid/null count detail functions.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Jake Hemstad (https://github.com/jrhemstad)
  - MithunR (https://github.com/mythrocks)

URL: #9588
@github-actions
Copy link

github-actions bot commented Jan 8, 2022

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@bdice
Copy link
Contributor Author

bdice commented Jan 11, 2022

Update: this is being addressed in #9621. @isVoid and I wrote a function segmented_null_mask_reduction in a017223 that should address many of the use cases needed for this issue.

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

rapids-bot bot pushed a commit that referenced this issue Mar 10, 2022
closes #9135 
closes #9552 

This PR adds support for numeric types to `simple_op`, `sum`, `prod`, `min`, `max`, `any`, `all`. Also, this PR adds `segmented_null_mask_reduction` to compute null mask reductions on segments.

Authors:
  - Michael Wang (https://github.com/isVoid)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Robert (Bobby) Evans (https://github.com/revans2)
  - Bradley Dice (https://github.com/bdice)
  - Jake Hemstad (https://github.com/jrhemstad)

URL: #9621
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants