Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] min/max segmented reduce for string and decimal #10417

Closed
revans2 opened this issue Mar 11, 2022 · 7 comments · Fixed by #10794
Closed

[FEA] min/max segmented reduce for string and decimal #10417

revans2 opened this issue Mar 11, 2022 · 7 comments · Fixed by #10794
Labels
feature request New feature or request Spark Functionality that helps Spark RAPIDS

Comments

@revans2
Copy link
Contributor

revans2 commented Mar 11, 2022

Is your feature request related to a problem? Please describe.
This is a follow on request for #9621.

It would be really great if we could also support String and DECIMAL_32, DECIMAL_64, and DECIMAL_128 with this same API for min and max.

In the short term I am going to use the old code I had for String/DECIMAL_128 and will try to bit cast the DECIMAL_32 and DECIMAL_64 values, before and after.

@bdice
Copy link
Contributor

bdice commented Mar 16, 2022

@revans2 I discussed adding decimal support with @isVoid and we're going to work on this soon! It may or may not make the 22.04 release but we have a game plan.

Can you elaborate a bit on what you expect for string reductions? I didn't find much in the code about existing reductions for string types -- can you list (or find in the code) the reduction types you would expect to see? Thanks! 👍

@jrhemstad
Copy link
Contributor

I didn't find much in the code about existing reductions for string types -- can you list (or find in the code) the reduction types you would expect to see?

Pretty much all other reductions support min/max on strings and just use the string_view::operator< function.

Typically we have to add a level of indirection and actually compute ARGMIN/ARGMAX and then perform a gather.

@isVoid
Copy link
Contributor

isVoid commented Mar 16, 2022

How about decimal types? Each fixed point column is supposed to have a uniform scale. However, since the length of each segment can be different, the product of these segments may possess different scales. One way I can think of this is since output_dtype is required, we can cast them to the desired type as a post-processing step. (From which I realized the current fixed point reduction seems to have ignored output_dtype)

@revans2
Copy link
Contributor Author

revans2 commented Mar 17, 2022

How about decimal types? Each fixed point column is supposed to have a uniform scale. However, since the length of each segment can be different, the product of these segments may possess different scales.

I care about min and max I am not concerned with product right now and min/max do not change the scale. All of these problems have been solved by groupby aggregations. If the groupby aggregation is not supported for this type, then I would not see any reason to support it for a segmented reduction. If it is supported then I would expect the result to be the same as what groupby does. Being inconsistent is a bigger problem.

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@bdice
Copy link
Contributor

bdice commented Apr 17, 2022

A status update since this was marked as stale: #10447 will address the string min/max case. We will need another PR to address decimal types. I would probably limit the scope to min/max aggregations for now if that is enough to resolve this issue.

rapids-bot bot pushed a commit that referenced this issue Apr 29, 2022
This PR adds `min/max` segmented reduction to string type.

Part of #10417

Authors:
  - Michael Wang (https://github.com/isVoid)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Bradley Dice (https://github.com/bdice)

URL: #10447
@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

rapids-bot bot pushed a commit that referenced this issue May 18, 2022
This PR adds support to min/max segmented reduction to fixed point type. Together with #10447, this PR closes #10417 

Besides, this PR refactors `segmented_reduce` to accept output iterators instead of allocating the result column from within.

Authors:
  - Michael Wang (https://github.com/isVoid)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu)
  - David Wendt (https://github.com/davidwendt)

URL: #10794
@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants