-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Remove bounds check for cudf::gather
#6523
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Please update the changelog in order to start CI tests. View the gpuCI docs here. |
Removing |
cudf::gather
cudf::gather
Codecov Report
@@ Coverage Diff @@
## branch-0.17 #6523 +/- ##
============================================
Coverage 81.94% 81.95%
============================================
Files 96 96
Lines 16164 16164
============================================
+ Hits 13246 13247 +1
+ Misses 2918 2917 -1
Continue to review full report at Codecov.
|
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like there are a lot of places where the original behavior has been changed from DONT_CHECK
to NULLIFY
.
@@ -44,7 +44,7 @@ std::unique_ptr<column> decode(dictionary_column_view const& source, | |||
// use gather to create the output column -- use ignore_out_of_bounds=true | |||
auto table_column = cudf::detail::gather(table_view{{source.keys()}}, | |||
indices, | |||
cudf::detail::out_of_bounds_policy::IGNORE, | |||
cudf::out_of_bounds_policy::NULLIFY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it would change the original behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidwendt and this?
cpp/src/dictionary/remove_keys.cu
Outdated
@@ -112,7 +112,7 @@ std::unique_ptr<column> remove_keys_fn( | |||
// Example: gather([0,max,1,max,2],[4,0,3,1,2,2,2,4,0]) => [2,0,max,max,1,1,1,2,0] | |||
auto table_indices = cudf::detail::gather(table_view{{map_indices->view()}}, | |||
indices_view, | |||
cudf::detail::out_of_bounds_policy::NULLIFY, | |||
cudf::out_of_bounds_policy::DONT_CHECK, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This too.
@@ -51,8 +51,8 @@ std::unique_ptr<column> group_argmax(column_view const& values, | |||
auto result_table = | |||
cudf::detail::gather(table_view({key_sort_order}), | |||
null_removed_indices, | |||
indices->nullable() ? cudf::detail::out_of_bounds_policy::IGNORE | |||
: cudf::detail::out_of_bounds_policy::NULLIFY, | |||
indices->nullable() ? cudf::out_of_bounds_policy::NULLIFY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This too.
@@ -51,8 +51,8 @@ std::unique_ptr<column> group_argmin(column_view const& values, | |||
auto result_table = | |||
cudf::detail::gather(table_view({key_sort_order}), | |||
null_removed_indices, | |||
indices->nullable() ? cudf::detail::out_of_bounds_policy::IGNORE | |||
: cudf::detail::out_of_bounds_policy::NULLIFY, | |||
indices->nullable() ? cudf::out_of_bounds_policy::NULLIFY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this.
cpp/src/groupby/sort/groupby.cu
Outdated
@@ -201,8 +201,8 @@ void store_result_functor::operator()<aggregation::MIN>(aggregation const& agg) | |||
auto transformed_result = | |||
cudf::detail::gather(table_view({values}), | |||
null_removed_map, | |||
argmin_result.nullable() ? cudf::detail::out_of_bounds_policy::IGNORE | |||
: cudf::detail::out_of_bounds_policy::NULLIFY, | |||
argmin_result.nullable() ? cudf::out_of_bounds_policy::DONT_CHECK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this.
cpp/src/groupby/sort/groupby.cu
Outdated
@@ -238,8 +238,8 @@ void store_result_functor::operator()<aggregation::MAX>(aggregation const& agg) | |||
auto transformed_result = | |||
cudf::detail::gather(table_view({values}), | |||
null_removed_map, | |||
argmax_result.nullable() ? cudf::detail::out_of_bounds_policy::IGNORE | |||
: cudf::detail::out_of_bounds_policy::NULLIFY, | |||
argmax_result.nullable() ? cudf::out_of_bounds_policy::DONT_CHECK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this.
Can you also update the We should strive to not have any raw |
I think the replacements result from a change to this line: cudf/cpp/src/copying/gather.cu Line 70 in fb789bf
Currently when calling cudf::gather with NULLIFY actually gets redirected to IGNORE code path, and this PR includes a fix to it. Accordingly, all calls to gather should be flipped to maintain code semantics (currently only partially replaced).
Addressing your second review comment, this line will be updated without using booleans. |
cudf::gather
cudf::gather
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just suggesting doc improvements. Note "bounds" is plural in "out of bounds" and "bounds checking".
Offline discussion with @jrhemstad : Instead of changing |
- Style - Error message Co-authored-by: Ashwin Srinath <[email protected]>
- Use DONT_CHECK in encode.cu
- Undo test_single_agg column test func - style
Ternary in place of if-else block Co-authored-by: nvdbaranec <[email protected]>
- State DONT_CHECK purpose - Style
Closing and defer to another PR for rebase. |
Closes #6478
cudf::gather
will not run a pre-pass to check for index validity.For
out_of_bounds_policy
, removeFAIL
, and exposeNULLIFY
andDONT_CHECK
to user.NULLIFY
sets out of bounds indices to null rows, whileDONT_CHECK
skips any checking. UsingDONT_CHECK
should yield higher performance to gather maps with only valid indices.Note that the negative index (wrap-arounds) policy is unchanged. When gather map dtype is signed, wrap-around is applied.
A new Cython binding to
cudf::minmax
, used for Cython bound checking is added. Will also closes #6731