Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add new parameter to cudf::gather to enable behavior of negative indices beget null elements #6479

Closed
shwina opened this issue Oct 9, 2020 · 7 comments
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.

Comments

@shwina
Copy link
Contributor

shwina commented Oct 9, 2020

This is the second of a series of issues as part of a larger join refactor.

It would be great if cudf::gather exposed a way to get null when negative indices are specified in the gather map.

cudf::gather({1, 2, 3, 4, 5, 6}, {1, 3, -1}) == {2, 4, null}

The join API is going to be refactored to return gather map(s) instead of the actual result of a join. This is so callers have the freedom to construct the output table in whatever way they like. Currently, the libcudf join code and join API are both complicated because it tries to construct an output table that meets Pandas' expectations. As a consequence, the Cython and Python code is also complicated.

This feature will enable the libcudf join API to return a gather map with negative values to indicate non-matches in outer joins.

cc: @jrhemstad

@shwina shwina added feature request New feature or request Needs Triage Need team to review and classify labels Oct 9, 2020
@shwina shwina added libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Oct 9, 2020
@jrhemstad
Copy link
Contributor

negative indices

While the original intention is for negative indices, I think we can expand it just to say that any out of bounds index yields a null.

@shwina
Copy link
Contributor Author

shwina commented Oct 9, 2020

cc: @brandon-b-miller as well

@mythrocks
Copy link
Contributor

It would be great if cudf::gather exposed a way to get null when negative indices are specified in the gather map.

I didn't realize that the following code wasn't the prescribed way to achieve that effect:

    auto output_table = detail::gather(table_view{{input}},
                                       output->view(),
                                       detail::out_of_bounds_policy::IGNORE,
                                       detail::negative_index_policy::NOT_ALLOWED,
                                       mr,
                                       stream);

This is from rolling.cu.

@shwina
Copy link
Contributor Author

shwina commented Oct 9, 2020

I think that code does do that, but this is not exposed in the public API

@mythrocks
Copy link
Contributor

Ah, so. Gotcha.

@github-actions
Copy link

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@shwina
Copy link
Contributor Author

shwina commented Feb 16, 2021

This is now enabled with the NULLIFY out of bounds policy introduced in #6523. Closing.

@shwina shwina closed this as completed Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

No branches or pull requests

3 participants