[FEA] Gather/Scatter optimization for negative indices #2675

jrhemstad · 2019-08-23T18:24:09Z

Is your feature request related to a problem? Please describe.

A gather or scatter operation take a "map" of integers that map input elements to output elements between two columns (or tables). In Python, these integers can potentially be negative, where a negative value is interpreted as an offset from the end of the range, e.g., -2 would map to n - 2.

However, libcudf's gather/scatter APIs require the map to contain values that are all within the range [0, n), i.e., no negative values. As such, this requires running a separate kernel to first "normalize" the map from Python into the correct map for libcudf. This is expensive as it requires a separate kernel invocation as well as an intermediate memory allocation.

Describe the solution you'd like
Doing this "normalization" step would be much more efficient if it was done during the execution of the gather kernel. This can be done fairly easily with a thrust::transform_iterator.

However, the caveat is that this will require adding an additional code path to the gather implementation. We currently rely on negative indices being ignored for the join implementation. We'll need the option to select between whether negative indices should be ignored or they should be normalized.

Additional Context
We should also update the gather_map and scatter_map to be a non-nullable gdf_column that can be a int32 or int64 to prevent having to cast an array of int64 elements to int32 (since int64 is most common in Python).

The text was updated successfully, but these errors were encountered:

jakirkham · 2019-08-23T18:45:47Z

cc-ing @rjzamora (in case this is of interest to you as well 😉)

kkraus14 · 2019-08-23T18:48:26Z

@jrhemstad If we're allowing both int32 and int64, would it be possible to allow int8 and int16 as well? Users can and will pass those as well.

jrhemstad · 2019-08-23T18:57:33Z

@jrhemstad If we're allowing both int32 and int64, would it be possible to allow int8 and int16 as well? Users can and will pass those as well.

Yeah, I guess there's no reason we can't support any integral type, so long as compile time doesn't blow up.

kkraus14 · 2019-08-23T19:00:58Z

I think it's okay for floats to require a typecast (yes Python users can / will pass floats into gather calls as well), but otherwise would be great to not require a typecast.

jrhemstad · 2019-08-23T19:07:26Z

yes Python users can / will pass floats into gather calls as well

😠

jakirkham · 2019-09-10T17:30:44Z

FWIW the interest in this issue from my perspective would be speeding up .iloc, which is relevant for the cuML Grid Search use case. 😉

cc @JohnZed @mrocklin

mrocklin · 2019-09-10T17:38:26Z

@kkraus14 it looks like you've added the Needs Prioritizing label on this a few weeks ago.

Does anyone have a sense for when this might be done? It seems like this is blocking progress in GridSearch + cuML work. No pressure, (well, a little pressure) we just need to know in order to figure out resourcing.

kkraus14 · 2019-09-10T17:39:18Z

@kkraus14 it looks like you've added the Needs Prioritizing label on this a few weeks ago.

Does anyone have a sense for when this might be done? It seems like this is blocking progress in GridSearch + cuML work. No pressure, (well, a little pressure) we just need to know in order to figure out resourcing.

The label is out of date 😅, this is actively being investigated and worked on. Will update accordingly.

mrocklin · 2019-09-10T17:39:28Z

Thanks Keith!

jrhemstad added feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. labels Aug 23, 2019

jrhemstad mentioned this issue Aug 23, 2019

[QST] cuDF performance with gridsearchcv #1888

Closed

jrhemstad assigned shwina Aug 23, 2019

shwina mentioned this issue Sep 10, 2019

[REVIEW] Improve gather performance #2775

Merged

shwina closed this as completed in #2775 Sep 27, 2019

shwina mentioned this issue Jul 22, 2020

[DISCUSSION] libcudf should not introspect input data to perform error checking #5505

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Gather/Scatter optimization for negative indices #2675

[FEA] Gather/Scatter optimization for negative indices #2675

jrhemstad commented Aug 23, 2019 •

edited

Loading

jakirkham commented Aug 23, 2019

kkraus14 commented Aug 23, 2019

jrhemstad commented Aug 23, 2019

kkraus14 commented Aug 23, 2019

jrhemstad commented Aug 23, 2019

jakirkham commented Sep 10, 2019

mrocklin commented Sep 10, 2019

kkraus14 commented Sep 10, 2019

mrocklin commented Sep 10, 2019

[FEA] Gather/Scatter optimization for negative indices #2675

[FEA] Gather/Scatter optimization for negative indices #2675

Comments

jrhemstad commented Aug 23, 2019 • edited Loading

jakirkham commented Aug 23, 2019

kkraus14 commented Aug 23, 2019

jrhemstad commented Aug 23, 2019

kkraus14 commented Aug 23, 2019

jrhemstad commented Aug 23, 2019

jakirkham commented Sep 10, 2019

mrocklin commented Sep 10, 2019

kkraus14 commented Sep 10, 2019

mrocklin commented Sep 10, 2019

jrhemstad commented Aug 23, 2019 •

edited

Loading