[FEA] min/max segmented reduce for string and decimal #10417

revans2 · 2022-03-11T19:56:56Z

Is your feature request related to a problem? Please describe.
This is a follow on request for #9621.

It would be really great if we could also support String and DECIMAL_32, DECIMAL_64, and DECIMAL_128 with this same API for min and max.

In the short term I am going to use the old code I had for String/DECIMAL_128 and will try to bit cast the DECIMAL_32 and DECIMAL_64 values, before and after.

bdice · 2022-03-16T17:36:07Z

@revans2 I discussed adding decimal support with @isVoid and we're going to work on this soon! It may or may not make the 22.04 release but we have a game plan.

Can you elaborate a bit on what you expect for string reductions? I didn't find much in the code about existing reductions for string types -- can you list (or find in the code) the reduction types you would expect to see? Thanks! 👍

jrhemstad · 2022-03-16T17:53:39Z

I didn't find much in the code about existing reductions for string types -- can you list (or find in the code) the reduction types you would expect to see?

Pretty much all other reductions support min/max on strings and just use the string_view::operator< function.

Typically we have to add a level of indirection and actually compute ARGMIN/ARGMAX and then perform a gather.

isVoid · 2022-03-16T19:40:28Z

How about decimal types? Each fixed point column is supposed to have a uniform scale. However, since the length of each segment can be different, the product of these segments may possess different scales. One way I can think of this is since output_dtype is required, we can cast them to the desired type as a post-processing step. (From which I realized the current fixed point reduction seems to have ignored output_dtype)

revans2 · 2022-03-17T19:29:58Z

How about decimal types? Each fixed point column is supposed to have a uniform scale. However, since the length of each segment can be different, the product of these segments may possess different scales.

I care about min and max I am not concerned with product right now and min/max do not change the scale. All of these problems have been solved by groupby aggregations. If the groupby aggregation is not supported for this type, then I would not see any reason to support it for a segmented reduction. If it is supported then I would expect the result to be the same as what groupby does. Being inconsistent is a bigger problem.

github-actions · 2022-04-16T20:03:31Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

bdice · 2022-04-17T00:34:48Z

A status update since this was marked as stale: #10447 will address the string min/max case. We will need another PR to address decimal types. I would probably limit the scope to min/max aggregations for now if that is enough to resolve this issue.

This PR adds `min/max` segmented reduction to string type. Part of #10417 Authors: - Michael Wang (https://github.com/isVoid) - Bradley Dice (https://github.com/bdice) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Bradley Dice (https://github.com/bdice) URL: #10447

github-actions · 2022-05-17T01:31:38Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

This PR adds support to min/max segmented reduction to fixed point type. Together with #10447, this PR closes #10417 Besides, this PR refactors `segmented_reduce` to accept output iterators instead of allocating the result column from within. Authors: - Michael Wang (https://github.com/isVoid) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu) - David Wendt (https://github.com/davidwendt) URL: #10794

revans2 added feature request New feature or request Needs Triage Need team to review and classify Spark Functionality that helps Spark RAPIDS labels Mar 11, 2022

This was referenced Mar 11, 2022

Use new list reduction API to improve performance NVIDIA/spark-rapids#4941

Merged

[FEA] Support org.apache.spark.sql.catalyst.expressions.ArrayExists NVIDIA/spark-rapids#4815

Closed

bdice mentioned this issue Mar 15, 2022

[FEA] Refactors and next steps for segmented reductions #10432

Open

isVoid mentioned this issue Mar 16, 2022

Support Segmented Min/Max Reduction on String Type #10447

Merged

github-actions bot added the inactive-30d label Apr 16, 2022

github-actions bot removed the inactive-30d label Apr 17, 2022

isVoid mentioned this issue May 4, 2022

Segmented Min/Max for Fixed Point Types #10794

Merged

github-actions bot added the inactive-30d label May 17, 2022

rapids-bot bot closed this as completed in #10794 May 18, 2022

bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] min/max segmented reduce for string and decimal #10417

[FEA] min/max segmented reduce for string and decimal #10417

revans2 commented Mar 11, 2022

bdice commented Mar 16, 2022

jrhemstad commented Mar 16, 2022

isVoid commented Mar 16, 2022 •

edited

Loading

revans2 commented Mar 17, 2022

github-actions bot commented Apr 16, 2022

bdice commented Apr 17, 2022

github-actions bot commented May 17, 2022

[FEA] min/max segmented reduce for string and decimal #10417

[FEA] min/max segmented reduce for string and decimal #10417

Comments

revans2 commented Mar 11, 2022

bdice commented Mar 16, 2022

jrhemstad commented Mar 16, 2022

isVoid commented Mar 16, 2022 • edited Loading

revans2 commented Mar 17, 2022

github-actions bot commented Apr 16, 2022

bdice commented Apr 17, 2022

github-actions bot commented May 17, 2022

isVoid commented Mar 16, 2022 •

edited

Loading