Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding cudf.cut method #8002

Merged
merged 70 commits into from
Jun 11, 2021
Merged

Adding cudf.cut method #8002

merged 70 commits into from
Jun 11, 2021

Conversation

marlenezw
Copy link
Contributor

This PR aims to add the cut method to cudf. Cut in CuDF will mirror pandas.cut and will be useful to use in hist.
It also builds off of the interval and IntervalIndex.

@github-actions github-actions bot added the Python Affects Python cuDF API. label Apr 20, 2021
@marlenezw marlenezw self-assigned this Apr 20, 2021
@marlenezw marlenezw added 2 - In Progress Currently a work in progress improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 20, 2021
@codecov
Copy link

codecov bot commented Apr 28, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.08@cefc266). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.08    #8002   +/-   ##
===============================================
  Coverage                ?   82.85%           
===============================================
  Files                   ?      110           
  Lines                   ?    18042           
  Branches                ?        0           
===============================================
  Hits                    ?    14949           
  Misses                  ?     3093           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cefc266...002752a. Read the comment docs.

@marlenezw
Copy link
Contributor Author

Almost have the correct outcomes for most cases. At the moment I'm returning a catgoricalIndex when I have a series input when we ideally want to return a series. Working on figuring this out, and also handling for duplicate categories before I make this ready for review.

Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some more nitpicks. Looking great!

python/cudf/cudf/core/cut.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/cut.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/dtypes.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/cut.py Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_cut.py Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_cut.py Outdated Show resolved Hide resolved
@marlenezw
Copy link
Contributor Author

Hi @isVoid, I've pushed the updated code. If it looks good to you, could you approve so we can merge this 😄

Copy link
Contributor

@rgsl888prabhu rgsl888prabhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor things, rest looks good

python/cudf/cudf/core/cut.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/cut.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/cut.py Outdated Show resolved Hide resolved
Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from one small point synced off line at line 171

@marlenezw
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 58b354f into rapidsai:branch-21.08 Jun 11, 2021
rapids-bot bot pushed a commit that referenced this pull request Jul 15, 2022
This PR aims to add `bins` to `Series.value counts` in cudf. It contributes towards addressing [issue 1061](#1061).  This issue also depends on the completion of `cudf.cut` and includes changes that are being reviewed in PR #8002

Authors:
  - Marlene  (https://github.com/marlenezw)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #8247
@parmarsuraj99
Copy link

Can we have it support multiple columns?

@shwina
Copy link
Contributor

shwina commented Jul 6, 2023

Hi @parmarsuraj99, could you please raise an issue describing your request? That will help get it visibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants