Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lists.sort_values API #7657

Merged
merged 11 commits into from
Mar 24, 2021
Merged

Add lists.sort_values API #7657

merged 11 commits into from
Mar 24, 2021

Conversation

isVoid
Copy link
Contributor

@isVoid isVoid commented Mar 19, 2021

Closes #7467

Introduces list method list.sort_values. Sorts each list of a LIST column based on given criterion. This method signature is aligned with Series.sort_values. Example:

>>> s = cudf.Series([[4, 2, None, 9], [8, 8, 2], [2, 1]])
>>> s.list.sort_values(ascending=False, na_position="last")
0    [nan, 9.0, 4.0, 2.0]
1         [8.0, 8.0, 2.0]
2              [2.0, 1.0]
dtype: list

This PR also includes exposing ListMethods to docs and a small docstring fix to cudf.Series.

@isVoid isVoid requested a review from a team as a code owner March 19, 2021 19:39
@github-actions github-actions bot added the Python Affects Python cuDF API. label Mar 19, 2021
@isVoid isVoid added 3 - Ready for Review Ready for review by team feature request New feature or request non-breaking Non-breaking change labels Mar 19, 2021
python/cudf/cudf/_lib/cpp/lists/sorting.pxd Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/lists.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/lists.pyx Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Mar 19, 2021

Codecov Report

Merging #7657 (8acf8f0) into branch-0.19 (7871e7a) will increase coverage by 0.61%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.19    #7657      +/-   ##
===============================================
+ Coverage        81.86%   82.48%   +0.61%     
===============================================
  Files              101      101              
  Lines            16884    17426     +542     
===============================================
+ Hits             13822    14373     +551     
+ Misses            3062     3053       -9     
Impacted Files Coverage Δ
python/cudf/cudf/core/column/categorical.py 91.97% <ø> (+0.58%) ⬆️
python/cudf/cudf/core/column/column.py 87.86% <ø> (+0.10%) ⬆️
python/cudf/cudf/core/column/datetime.py 89.63% <ø> (+0.54%) ⬆️
python/cudf/cudf/core/column/decimal.py 92.75% <ø> (-2.12%) ⬇️
python/cudf/cudf/core/column/lists.py 90.00% <ø> (-1.40%) ⬇️
python/cudf/cudf/core/column/numerical.py 94.83% <ø> (-0.20%) ⬇️
python/cudf/cudf/core/column/string.py 86.79% <ø> (+0.30%) ⬆️
python/cudf/cudf/core/column/timedelta.py 88.57% <ø> (+0.33%) ⬆️
python/cudf/cudf/core/column_accessor.py 96.01% <ø> (+0.70%) ⬆️
python/cudf/cudf/core/dataframe.py 90.90% <ø> (+0.43%) ⬆️
... and 64 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d500142...8acf8f0. Read the comment docs.

@@ -58,3 +63,22 @@ def explode_outer(Table tbl, int explode_column_idx, bool ignore_index=False):
column_names=tbl._column_names,
index_names=None if ignore_index else tbl._index_names
)


def sort_lists(Column col, object order_enum, object null_order_enum):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to more tightly type the enums here? I thought we had a pattern fo handling enum values like this elsewhere but I could be mistaken.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately no. I think typically we just pass a Python object to Cython, where we then plumb it through to the appropriate C++ type. See for example https://github.com/rapidsai/cudf/blob/branch-0.19/python/cudf/cudf/_lib/sort.pyx#L26.

I'd suggest we do something similar here, where the Cython API shouldn't expect a Python enum, but rather something like a string or a bool. That encapsulates things a little better, making the use of an enum an implementation detail.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, @isVoid I got this backwards and you had it right from the beginning

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be addressed before merging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Sorry misunderstood on this one!

@isVoid isVoid mentioned this pull request Mar 22, 2021
@isVoid isVoid added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Mar 22, 2021
@kkraus14 kkraus14 removed the 5 - Ready to Merge Testing and reviews complete, ready to merge label Mar 23, 2021
@kkraus14 kkraus14 added the 0 - Waiting on Author Waiting for author to respond to review label Mar 23, 2021
@kkraus14 kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 0 - Waiting on Author Waiting for author to respond to review labels Mar 24, 2021
@kkraus14
Copy link
Collaborator

@gpucibot merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge feature request New feature or request non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Python bindings for lists::sort
4 participants