Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Remove bounds check for cudf::gather #6523

Closed
wants to merge 41 commits into from

Conversation

isVoid
Copy link
Contributor

@isVoid isVoid commented Oct 14, 2020

Closes #6478

cudf::gather will not run a pre-pass to check for index validity.

For out_of_bounds_policy, remove FAIL, and expose NULLIFY and DONT_CHECK to user. NULLIFY sets out of bounds indices to null rows, while DONT_CHECK skips any checking. Using DONT_CHECK should yield higher performance to gather maps with only valid indices.

Note that the negative index (wrap-arounds) policy is unchanged. When gather map dtype is signed, wrap-around is applied.

A new Cython binding to cudf::minmax, used for Cython bound checking is added. Will also closes #6731

@isVoid isVoid requested review from a team as code owners October 14, 2020 06:52
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@isVoid isVoid changed the base branch from branch-0.16 to branch-0.17 October 14, 2020 06:52
@GPUtester
Copy link
Collaborator

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

cpp/include/cudf/copying.hpp Outdated Show resolved Hide resolved
@ajschmidt8 ajschmidt8 removed the request for review from a team October 14, 2020 18:02
@ajschmidt8
Copy link
Member

Removing ops-codeowners from the required reviews since it doesn't seem there are any file changes that we're responsible for. Feel free to add us back if necessary.

@isVoid isVoid changed the title [WIP] Remove bound check for cudf::gather [REVIEW] Remove bound check for cudf::gather Nov 11, 2020
@codecov
Copy link

codecov bot commented Nov 11, 2020

Codecov Report

Merging #6523 (f90e028) into branch-0.17 (1c81827) will increase coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##           branch-0.17    #6523   +/-   ##
============================================
  Coverage        81.94%   81.95%           
============================================
  Files               96       96           
  Lines            16164    16164           
============================================
+ Hits             13246    13247    +1     
+ Misses            2918     2917    -1     
Impacted Files Coverage Δ
python/cudf/cudf/core/frame.py 90.06% <0.00%> (+0.12%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c81827...68b2673. Read the comment docs.

@isVoid
Copy link
Contributor Author

isVoid commented Nov 12, 2020

rerun tests

Copy link
Contributor

@jrhemstad jrhemstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there are a lot of places where the original behavior has been changed from DONT_CHECK to NULLIFY.

cpp/include/cudf/copying.hpp Outdated Show resolved Hide resolved
@@ -44,7 +44,7 @@ std::unique_ptr<column> decode(dictionary_column_view const& source,
// use gather to create the output column -- use ignore_out_of_bounds=true
auto table_column = cudf::detail::gather(table_view{{source.keys()}},
indices,
cudf::detail::out_of_bounds_policy::IGNORE,
cudf::out_of_bounds_policy::NULLIFY,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it would change the original behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidwendt and this?

@@ -112,7 +112,7 @@ std::unique_ptr<column> remove_keys_fn(
// Example: gather([0,max,1,max,2],[4,0,3,1,2,2,2,4,0]) => [2,0,max,max,1,1,1,2,0]
auto table_indices = cudf::detail::gather(table_view{{map_indices->view()}},
indices_view,
cudf::detail::out_of_bounds_policy::NULLIFY,
cudf::out_of_bounds_policy::DONT_CHECK,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too.

@@ -51,8 +51,8 @@ std::unique_ptr<column> group_argmax(column_view const& values,
auto result_table =
cudf::detail::gather(table_view({key_sort_order}),
null_removed_indices,
indices->nullable() ? cudf::detail::out_of_bounds_policy::IGNORE
: cudf::detail::out_of_bounds_policy::NULLIFY,
indices->nullable() ? cudf::out_of_bounds_policy::NULLIFY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too.

@@ -51,8 +51,8 @@ std::unique_ptr<column> group_argmin(column_view const& values,
auto result_table =
cudf::detail::gather(table_view({key_sort_order}),
null_removed_indices,
indices->nullable() ? cudf::detail::out_of_bounds_policy::IGNORE
: cudf::detail::out_of_bounds_policy::NULLIFY,
indices->nullable() ? cudf::out_of_bounds_policy::NULLIFY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this.

@@ -201,8 +201,8 @@ void store_result_functor::operator()<aggregation::MIN>(aggregation const& agg)
auto transformed_result =
cudf::detail::gather(table_view({values}),
null_removed_map,
argmin_result.nullable() ? cudf::detail::out_of_bounds_policy::IGNORE
: cudf::detail::out_of_bounds_policy::NULLIFY,
argmin_result.nullable() ? cudf::out_of_bounds_policy::DONT_CHECK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this.

@@ -238,8 +238,8 @@ void store_result_functor::operator()<aggregation::MAX>(aggregation const& agg)
auto transformed_result =
cudf::detail::gather(table_view({values}),
null_removed_map,
argmax_result.nullable() ? cudf::detail::out_of_bounds_policy::IGNORE
: cudf::detail::out_of_bounds_policy::NULLIFY,
argmax_result.nullable() ? cudf::out_of_bounds_policy::DONT_CHECK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this.

cpp/src/transform/encode.cu Outdated Show resolved Hide resolved
@jrhemstad
Copy link
Contributor

jrhemstad commented Nov 13, 2020

Can you also update the detail::gather to use out_of_bounds_policy instead of the raw bool here: https://github.com/rapidsai/cudf/blob/branch-0.17/cpp/include/cudf/detail/gather.cuh#L622

We should strive to not have any raw bools in APIs.

@isVoid
Copy link
Contributor Author

isVoid commented Nov 13, 2020

It seems like there are a lot of places where the original behavior has been changed from DONT_CHECK to NULLIFY.

I think the replacements result from a change to this line:

source_table, map_begin, map_end, bounds == out_of_bounds_policy::IGNORE, mr, stream);

Currently when calling cudf::gather with NULLIFY actually gets redirected to IGNORE code path, and this PR includes a fix to it. Accordingly, all calls to gather should be flipped to maintain code semantics (currently only partially replaced).

Addressing your second review comment, this line will be updated without using booleans.

@harrism harrism changed the title [REVIEW] Remove bound check for cudf::gather [REVIEW] Remove bounds check for cudf::gather Nov 16, 2020
Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just suggesting doc improvements. Note "bounds" is plural in "out of bounds" and "bounds checking".

CHANGELOG.md Outdated Show resolved Hide resolved
cpp/include/cudf/copying.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/copying.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/copying.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/detail/gather.hpp Outdated Show resolved Hide resolved
@isVoid
Copy link
Contributor Author

isVoid commented Nov 16, 2020

Offline discussion with @jrhemstad : Instead of changing expect_equal to expect_equivalent and force the test cases pass, let's investigate each gather API call and make sure the semantics remain unchanged before and after the refactor.

@isVoid
Copy link
Contributor Author

isVoid commented Dec 2, 2020

Closing and defer to another PR for rebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] minmax reduction binding in Cython/Python [FEA] Remove bounds checking from cudf::gather
10 participants