[REVIEW] Improve gather performance #2775

shwina · 2019-09-10T22:17:09Z

Implement the improvements to gather suggested in #2675.

Closes #2675. Addresses #1888.

…gather

python/dask_cudf/record.txt

kkraus14 · 2019-09-10T22:30:50Z

@shwina are we going to handle the int8, int16, and int64 gathering in this PR or was the typecasting deemed cheap enough that it didn't matter?

codecov · 2019-09-10T23:37:18Z

Codecov Report

Merging #2775 into branch-0.10 will increase coverage by 0.01%.
The diff coverage is 96%.

@@               Coverage Diff               @@
##           branch-0.10    #2775      +/-   ##
===============================================
+ Coverage        86.51%   86.53%   +0.01%     
===============================================
  Files               48       48              
  Lines             9013     9000      -13     
===============================================
- Hits              7798     7788      -10     
+ Misses            1215     1212       -3

Impacted Files	Coverage Δ
python/cudf/cudf/core/column/__init__.py	`100% <ø> (ø)`	⬆️
python/cudf/cudf/core/dataframe.py	`93.72% <100%> (-0.01%)`	⬇️
python/cudf/cudf/core/series.py	`93.33% <100%> (ø)`	⬆️
python/cudf/cudf/core/column/datetime.py	`90.9% <100%> (ø)`	⬆️
python/cudf/cudf/core/column/numerical.py	`94.34% <100%> (ø)`	⬆️
python/cudf/cudf/core/column/column.py	`86.88% <94.44%> (+0.19%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b96073c...6a350d4. Read the comment docs.

shwina · 2019-09-11T00:17:10Z

@kkraus14 yes, that will be part of this PR

cpp/src/copying/gather.cu

…mprove-gather

cpp/src/copying/gather.cu

…n._concat

…mprove-gather

jrhemstad · 2019-09-26T13:15:46Z

cpp/include/cudf/copying.hpp

 * the source columns.
 *
- * If any index in scatter_map is outside the range of [0, target.num_rows()), 
+ * If any index in `scatter_map` is outside the range of [0, target.num_rows()),


Suggested change

* If any index in `scatter_map` is outside the range of [0, target.num_rows()),

* @throws `cudf::logic_error` if `check_bounds == true` and any index in `scatter_map` is outside

* the range `[0, target.num_rows())

*

* If `check_bounds == false` and any index in `scatter_map` is outside the range of [0, target.num_rows()),

jrhemstad · 2019-09-26T13:16:27Z

cpp/include/cudf/copying.hpp

+ * The number of elements in the `scatter_map` must equal the number of rows in
+ * the source columns.
+ *
+ * If any index in `scatter_map` is outside the range of [0, target.num_rows()),


Suggested change

* If any index in `scatter_map` is outside the range of [0, target.num_rows()),

* @throws `cudf::logic_error` if `check_bounds == true` and any index in `scatter_map` is outside

* the range `[0, target.num_rows())

*

* If any index in `scatter_map` is outside the range of [0, target.num_rows()),

jrhemstad · 2019-09-26T13:17:45Z

cpp/include/cudf/copying.hpp

 * The datatypes between coresponding columns in the source and target
 * columns must be the same.
 *
- * If any index in scatter_map is outside the range of [0, num rows in
- * target_columns), the result is undefined.
+ * A negative index `i` in the `scatter_map` is interpreted as `i+n`, where


Documentation of the scater APIs are inconsistent. The documentation of the previous two APIs would lead you to believe that a negative index is UB.

jrhemstad · 2019-09-26T13:20:40Z

cpp/include/cudf/copying.hpp

+ *
+ * If `check_bounds == false` and any index in the `scatter_map` is outside the range
+ * `[-n, n)`, where `n` is the number of rows in the `source_table`, the
+ * behavior is undefined.
 *


Suggested change

*

*

* @throws `cudf::logic_error` if `check_bounds == true` and any index in the `scatter_map` is outside

* the range `[-n, n)`

jrhemstad · 2019-09-26T13:21:38Z

cpp/include/cudf/copying.hpp

- * undefined.
+ * If `check_bounds == false` and any index in the `gather_map` is outside the range
+ * `[-n, n)`, where `n` is the number of rows in the `source_table`, the
+ * behavior is undefined.


Suggested change

* behavior is undefined.

* behavior is undefined.

*

* @throws `cudf::logic_error` if `check_bounds == true` and any index in the `gather_map` is

* outside the range `[-n, n)`

…mprove-gather

…improve-gather

davidwendt · 2019-09-27T13:20:50Z

cpp/src/copying/gather.cuh

+* Positive indices are unchanged by this transformation.
+*---------------------------------------------------------------------------**/
+template <bool enable, typename map_type>
+struct negative_index_converter : public thrust::unary_function<map_type,map_type>{};


Double negatives can be confusing.
How about making this an index_converter and changing enable to be negative?
Seems it would be clearer to enable negative on something positive than disabling negative to make something positive.

Agree that it can be confusing. How about an enum template parameter that makes it more explicit what the converter does?

enum index_conversion { NEGATIVE_TO_POSITIVE = 0, SOMETHING_ELSE =1 , NONE = 2, }

Yes. I saw this line and had to look up what it was doing.

negative_index_converter<false,map_type>{...}

Something like this perhaps:

template <typename map_type, index_conversion ic = NOTHING> struct index_converter ...

and then maybe

index_converter<map_type, NEGATIVE>{ ... }

and normal pass through would be just

index_converter<map_type>{ ... }

First pass at using transform_iterator to handle negative indices in …

e5386b3

…gather

shwina requested review from a team as code owners September 10, 2019 22:17

kkraus14 reviewed Sep 10, 2019

View reviewed changes

python/dask_cudf/record.txt Outdated Show resolved Hide resolved

jrhemstad reviewed Sep 11, 2019

View reviewed changes

cpp/src/copying/gather.cu Outdated Show resolved Hide resolved

shwina added 4 commits September 11, 2019 12:00

Add version of gather that accepts gathermap as gdf_column

1f8ba6e

Add type dispatch to map type in gather

9e1a7b5

Remove typecasting gathermap in Cython

df8103f

Merge branch 'branch-0.10' of https://github.com/rapidsai/cudf into i…

e38e36d

…mprove-gather

shwina mentioned this pull request Sep 12, 2019

[QST] cuDF performance with gridsearchcv #1888

Closed

shwina added 4 commits September 12, 2019 17:12

Restore normalize_maps

3134cc7

Add code path that does not transform gather map

8ce390b

Allocate output in libcudf for gather

9beae3b

Replace use of to_gpu_array() with mem

d5e4c8d

kkraus14 added 2 - In Progress Currently a work in progress Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Sep 14, 2019

kkraus14 assigned shwina Sep 14, 2019

harrism requested changes Sep 15, 2019

View reviewed changes

cpp/src/copying/gather.cu Outdated Show resolved Hide resolved

cpp/src/copying/gather.cu Outdated Show resolved Hide resolved

shwina added 5 commits September 16, 2019 09:57

Update categories on gather result

6460ac9

Return typed column from column_empty_like_same_mask

19bd8d8

Propagate empty map case to libcudf from Cython

fc37724

Use as_column factory instead of Column constructor directly in colum…

e6c3846

…n._concat

Fix bug in column_empty_like_same_mask

890ba64

shwina added 6 commits September 20, 2019 12:57

Update gather and scatter docs

01a39bb

Use grid_config_1d to configure grid/block size for invert_map kernel

6de1a71

Fix typo

1cdbe67

Merge branch 'branch-0.10' of https://github.com/rapidsai/cudf into i…

46b183a

…mprove-gather

Remove record.txt

d17a89b

Merge branch 'branch-0.10' of https://github.com/rapidsai/cudf into i…

0a4b168

…mprove-gather

shwina mentioned this pull request Sep 23, 2019

[REVIEW] Fix column creation from ephemeral cupy arrays #2854

Merged

Make call to detail::scatter explicit

e06565d

shwina requested review from harrism and jrhemstad September 23, 2019 17:45

shwina added 2 commits September 23, 2019 17:03

Revert to performing a fill *before* the invert kernel in scatter

4f35e74

Add docs for overloaded functions in copying.hpp

7464796

harrism approved these changes Sep 25, 2019

View reviewed changes

Merge branch 'branch-0.10' into improve-gather

6e1594b

jrhemstad requested changes Sep 26, 2019

View reviewed changes

shwina added 4 commits September 26, 2019 15:52

Merge branch 'branch-0.10' of https://github.com/rapidsai/cudf into i…

bd91806

…mprove-gather

Consistently add @throws to gather/scatter docs

d22ec45

Merge branch 'branch-0.10' of https://github.com/rapidsai/cudf into i…

fe9b04c

…mprove-gather

Merge branch 'improve-gather' of https://github.com/shwina/cudf into …

1bb6e48

…improve-gather

jrhemstad approved these changes Sep 27, 2019

View reviewed changes

davidwendt reviewed Sep 27, 2019

View reviewed changes

kkraus14 approved these changes Sep 27, 2019

View reviewed changes

shwina changed the title ~~[REVIEW] Improve gather performance~~ [WIP] Improve gather performance Sep 27, 2019

Replace use of bool template parameter with enum parameter

6a350d4

shwina requested a review from davidwendt September 27, 2019 19:44

davidwendt approved these changes Sep 27, 2019

View reviewed changes

shwina changed the title ~~[WIP] Improve gather performance~~ [REVIEW] Improve gather performance Sep 27, 2019

shwina merged commit 5efdfc2 into rapidsai:branch-0.10 Sep 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] Improve gather performance #2775

[REVIEW] Improve gather performance #2775

shwina commented Sep 10, 2019

kkraus14 commented Sep 10, 2019

codecov bot commented Sep 10, 2019 •

edited

Loading

shwina commented Sep 11, 2019

jrhemstad Sep 26, 2019

jrhemstad Sep 26, 2019

jrhemstad Sep 26, 2019

jrhemstad Sep 26, 2019

jrhemstad Sep 26, 2019

davidwendt Sep 27, 2019

shwina Sep 27, 2019

davidwendt Sep 27, 2019 •

edited

Loading

- * If any index in `scatter_map` is outside the range of [0, target.num_rows()),
+ * @throws `cudf::logic_error` if `check_bounds == true` and any index in `scatter_map` is outside
+ * the range `[0, target.num_rows())
+ *
+ * If `check_bounds == false` and any index in `scatter_map` is outside the range of [0, target.num_rows()),

- *
+ *
+ * @throws `cudf::logic_error` if `check_bounds == true` and any index in the `scatter_map` is outside
+ * the range `[-n, n)`

[REVIEW] Improve gather performance #2775

[REVIEW] Improve gather performance #2775

Conversation

shwina commented Sep 10, 2019

kkraus14 commented Sep 10, 2019

codecov bot commented Sep 10, 2019 • edited Loading

Codecov Report

shwina commented Sep 11, 2019

jrhemstad Sep 26, 2019

Choose a reason for hiding this comment

jrhemstad Sep 26, 2019

Choose a reason for hiding this comment

jrhemstad Sep 26, 2019

Choose a reason for hiding this comment

jrhemstad Sep 26, 2019

Choose a reason for hiding this comment

jrhemstad Sep 26, 2019

Choose a reason for hiding this comment

davidwendt Sep 27, 2019

Choose a reason for hiding this comment

shwina Sep 27, 2019

Choose a reason for hiding this comment

davidwendt Sep 27, 2019 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Sep 10, 2019 •

edited

Loading

davidwendt Sep 27, 2019 •

edited

Loading