-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Improve gather performance #2775
Conversation
@shwina are we going to handle the |
Codecov Report
@@ Coverage Diff @@
## branch-0.10 #2775 +/- ##
===============================================
+ Coverage 86.51% 86.53% +0.01%
===============================================
Files 48 48
Lines 9013 9000 -13
===============================================
- Hits 7798 7788 -10
+ Misses 1215 1212 -3
Continue to review full report at Codecov.
|
@kkraus14 yes, that will be part of this PR |
cpp/include/cudf/copying.hpp
Outdated
* the source columns. | ||
* | ||
* If any index in scatter_map is outside the range of [0, target.num_rows()), | ||
* If any index in `scatter_map` is outside the range of [0, target.num_rows()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* If any index in `scatter_map` is outside the range of [0, target.num_rows()), | |
* @throws `cudf::logic_error` if `check_bounds == true` and any index in `scatter_map` is outside | |
* the range `[0, target.num_rows()) | |
* | |
* If `check_bounds == false` and any index in `scatter_map` is outside the range of [0, target.num_rows()), |
cpp/include/cudf/copying.hpp
Outdated
* The number of elements in the `scatter_map` must equal the number of rows in | ||
* the source columns. | ||
* | ||
* If any index in `scatter_map` is outside the range of [0, target.num_rows()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* If any index in `scatter_map` is outside the range of [0, target.num_rows()), | |
* @throws `cudf::logic_error` if `check_bounds == true` and any index in `scatter_map` is outside | |
* the range `[0, target.num_rows()) | |
* | |
* If any index in `scatter_map` is outside the range of [0, target.num_rows()), |
* The datatypes between coresponding columns in the source and target | ||
* columns must be the same. | ||
* | ||
* If any index in scatter_map is outside the range of [0, num rows in | ||
* target_columns), the result is undefined. | ||
* A negative index `i` in the `scatter_map` is interpreted as `i+n`, where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation of the scater
APIs are inconsistent. The documentation of the previous two APIs would lead you to believe that a negative index is UB.
cpp/include/cudf/copying.hpp
Outdated
* | ||
* If `check_bounds == false` and any index in the `scatter_map` is outside the range | ||
* `[-n, n)`, where `n` is the number of rows in the `source_table`, the | ||
* behavior is undefined. | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* | |
* | |
* @throws `cudf::logic_error` if `check_bounds == true` and any index in the `scatter_map` is outside | |
* the range `[-n, n)` |
cpp/include/cudf/copying.hpp
Outdated
* undefined. | ||
* If `check_bounds == false` and any index in the `gather_map` is outside the range | ||
* `[-n, n)`, where `n` is the number of rows in the `source_table`, the | ||
* behavior is undefined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* behavior is undefined. | |
* behavior is undefined. | |
* | |
* @throws `cudf::logic_error` if `check_bounds == true` and any index in the `gather_map` is | |
* outside the range `[-n, n)` |
cpp/src/copying/gather.cuh
Outdated
* Positive indices are unchanged by this transformation. | ||
*---------------------------------------------------------------------------**/ | ||
template <bool enable, typename map_type> | ||
struct negative_index_converter : public thrust::unary_function<map_type,map_type>{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double negatives can be confusing.
How about making this an index_converter
and changing enable
to be negative
?
Seems it would be clearer to enable negative on something positive than disabling negative to make something positive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that it can be confusing. How about an enum
template parameter that makes it more explicit what the converter does?
enum index_conversion {
NEGATIVE_TO_POSITIVE = 0,
SOMETHING_ELSE =1 ,
NONE = 2,
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I saw this line and had to look up what it was doing.
negative_index_converter<false,map_type>{...}
Something like this perhaps:
template <typename map_type, index_conversion ic = NOTHING>
struct index_converter ...
and then maybe
index_converter<map_type, NEGATIVE>{ ... }
and normal pass through would be just
index_converter<map_type>{ ... }
Implement the improvements to gather suggested in #2675.
Closes #2675. Addresses #1888.