Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Improve gather performance #2775

Merged
merged 52 commits into from
Sep 27, 2019
Merged

Conversation

shwina
Copy link
Contributor

@shwina shwina commented Sep 10, 2019

Implement the improvements to gather suggested in #2675.

Closes #2675. Addresses #1888.

@shwina shwina requested review from a team as code owners September 10, 2019 22:17
@kkraus14
Copy link
Collaborator

@shwina are we going to handle the int8, int16, and int64 gathering in this PR or was the typecasting deemed cheap enough that it didn't matter?

@codecov
Copy link

codecov bot commented Sep 10, 2019

Codecov Report

Merging #2775 into branch-0.10 will increase coverage by 0.01%.
The diff coverage is 96%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.10    #2775      +/-   ##
===============================================
+ Coverage        86.51%   86.53%   +0.01%     
===============================================
  Files               48       48              
  Lines             9013     9000      -13     
===============================================
- Hits              7798     7788      -10     
+ Misses            1215     1212       -3
Impacted Files Coverage Δ
python/cudf/cudf/core/column/__init__.py 100% <ø> (ø) ⬆️
python/cudf/cudf/core/dataframe.py 93.72% <100%> (-0.01%) ⬇️
python/cudf/cudf/core/series.py 93.33% <100%> (ø) ⬆️
python/cudf/cudf/core/column/datetime.py 90.9% <100%> (ø) ⬆️
python/cudf/cudf/core/column/numerical.py 94.34% <100%> (ø) ⬆️
python/cudf/cudf/core/column/column.py 86.88% <94.44%> (+0.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b96073c...6a350d4. Read the comment docs.

@shwina
Copy link
Contributor Author

shwina commented Sep 11, 2019

@kkraus14 yes, that will be part of this PR

@kkraus14 kkraus14 added 2 - In Progress Currently a work in progress Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Sep 14, 2019
cpp/src/copying/gather.cu Outdated Show resolved Hide resolved
cpp/src/copying/gather.cu Outdated Show resolved Hide resolved
@shwina shwina requested review from harrism and jrhemstad September 23, 2019 17:45
* the source columns.
*
* If any index in scatter_map is outside the range of [0, target.num_rows()),
* If any index in `scatter_map` is outside the range of [0, target.num_rows()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* If any index in `scatter_map` is outside the range of [0, target.num_rows()),
* @throws `cudf::logic_error` if `check_bounds == true` and any index in `scatter_map` is outside
* the range `[0, target.num_rows())
*
* If `check_bounds == false` and any index in `scatter_map` is outside the range of [0, target.num_rows()),

* The number of elements in the `scatter_map` must equal the number of rows in
* the source columns.
*
* If any index in `scatter_map` is outside the range of [0, target.num_rows()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* If any index in `scatter_map` is outside the range of [0, target.num_rows()),
* @throws `cudf::logic_error` if `check_bounds == true` and any index in `scatter_map` is outside
* the range `[0, target.num_rows())
*
* If any index in `scatter_map` is outside the range of [0, target.num_rows()),

* The datatypes between coresponding columns in the source and target
* columns must be the same.
*
* If any index in scatter_map is outside the range of [0, num rows in
* target_columns), the result is undefined.
* A negative index `i` in the `scatter_map` is interpreted as `i+n`, where
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation of the scater APIs are inconsistent. The documentation of the previous two APIs would lead you to believe that a negative index is UB.

*
* If `check_bounds == false` and any index in the `scatter_map` is outside the range
* `[-n, n)`, where `n` is the number of rows in the `source_table`, the
* behavior is undefined.
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*
*
* @throws `cudf::logic_error` if `check_bounds == true` and any index in the `scatter_map` is outside
* the range `[-n, n)`

* undefined.
* If `check_bounds == false` and any index in the `gather_map` is outside the range
* `[-n, n)`, where `n` is the number of rows in the `source_table`, the
* behavior is undefined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* behavior is undefined.
* behavior is undefined.
*
* @throws `cudf::logic_error` if `check_bounds == true` and any index in the `gather_map` is
* outside the range `[-n, n)`

* Positive indices are unchanged by this transformation.
*---------------------------------------------------------------------------**/
template <bool enable, typename map_type>
struct negative_index_converter : public thrust::unary_function<map_type,map_type>{};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double negatives can be confusing.
How about making this an index_converter and changing enable to be negative?
Seems it would be clearer to enable negative on something positive than disabling negative to make something positive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that it can be confusing. How about an enum template parameter that makes it more explicit what the converter does?

enum index_conversion {
    NEGATIVE_TO_POSITIVE = 0,
    SOMETHING_ELSE =1 ,
    NONE = 2,
}

Copy link
Contributor

@davidwendt davidwendt Sep 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I saw this line and had to look up what it was doing.

negative_index_converter<false,map_type>{...}

Something like this perhaps:

template <typename map_type, index_conversion ic = NOTHING>
struct index_converter ...

and then maybe

index_converter<map_type, NEGATIVE>{ ... }  

and normal pass through would be just

index_converter<map_type>{ ... }  

@shwina shwina changed the title [REVIEW] Improve gather performance [WIP] Improve gather performance Sep 27, 2019
@shwina shwina requested a review from davidwendt September 27, 2019 19:44
@shwina shwina changed the title [WIP] Improve gather performance [REVIEW] Improve gather performance Sep 27, 2019
@shwina shwina merged commit 5efdfc2 into rapidsai:branch-0.10 Sep 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Gather/Scatter optimization for negative indices
5 participants