Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Stable sort option in sorted_order() #4189

Closed
devavret opened this issue Feb 18, 2020 · 4 comments · Fixed by #4272
Closed

[FEA] Stable sort option in sorted_order() #4189

devavret opened this issue Feb 18, 2020 · 4 comments · Fixed by #4272
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.

Comments

@devavret
Copy link
Contributor

Is your feature request related to a problem? Please describe.
sorted_order() uses thrust::sort() which is not stable. In sort groupby, we need to be able to sort keys in a stable manner so that we can match pandas' behaviour for groupby.cumsum() #1298 (comment) and groupby.nth()#4172 (comment).

Describe the solution you'd like
Add a parameter to sorted_order() API bool stable = false which can be flipped to use thrust::stable_sort.

@devavret devavret added feature request New feature or request Needs Triage Need team to review and classify labels Feb 18, 2020
@jrhemstad
Copy link
Contributor

I'm initially inclined to make these separate APIs rather than a parameter.

This would be more inline with std::sort and std::stable_sort.

@devavret
Copy link
Contributor Author

I'm initially inclined to make these separate APIs rather than a parameter.

I thought you (and @harrism) were against increasing API size. Why the change of heart?

@jrhemstad
Copy link
Contributor

I'm initially inclined to make these separate APIs rather than a parameter.

I thought you (and @harrism) were against increasing API size. Why the change of heart?

Generally, I'm against unnecessarily increasing the API surface area. As you've demonstrated, having the option for stable sorting is necessary.

In this case, adding a new parameter vs. adding a new function has the same effect on the surface area (requires the same number of new tests). Therefore, since surface area isn't a concern, it makes sense in my mind to mirror the STL and Thrust and have separate functions instead of parameters.

@harrism
Copy link
Member

harrism commented Feb 20, 2020

I'm all for necessary API growth, and as Jake points out parameter vs. API is still an increase in testing / documentation / maintenance surface area. What I'm against is unnecessarily having multiple APIs that do the same thing.

The sort/stable_sort as separate APIs matches STL and Thrust, and we do aim to have similar naming/terminology to those.

@kkraus14 kkraus14 added libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Feb 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants