Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for null and non-numeric types in Series.diff and DataFrame.diff #10625

Merged
merged 11 commits into from
Apr 15, 2022

Conversation

Matt711
Copy link
Contributor

@Matt711 Matt711 commented Apr 8, 2022

This PR supports non-numeric data types (timestamp and ranges) in Series.diff and DataFrame.diff. In DataFrame.diff, datetime ranges are already supported because DataFrame.shift works. But Series.diff doesn't use the Series.shift implementation, so there wasn't support for datetime ranges.

import datetime
dti = pd.to_datetime(
    ["1/1/2018", np.datetime64("2018-01-01"), datetime.datetime(2018, 1, 1), datetime.datetime(2020, 1, 1)]
)
df = DataFrame({"dates": dti})
df.diff(periods=periods, axis=axis)

closes #10212.

@GPUtester
Copy link
Collaborator

Can one of the admins verify this patch?

@github-actions github-actions bot added the Python Affects Python cuDF API. label Apr 8, 2022
@bdice
Copy link
Contributor

bdice commented Apr 8, 2022

add to allowlist (edit: looks like this didn't work. I may not have privileges to add Matt to the list of allowed CI users.)

@bdice bdice added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 8, 2022
@bdice bdice marked this pull request as ready for review April 8, 2022 21:34
@bdice bdice requested a review from a team as a code owner April 8, 2022 21:34
@bdice bdice requested review from shwina and rgsl888prabhu April 8, 2022 21:34
@sevagh
Copy link
Contributor

sevagh commented Apr 8, 2022

add to allowlist

@codecov
Copy link

codecov bot commented Apr 8, 2022

Codecov Report

Merging #10625 (eef9434) into branch-22.06 (8f5a044) will increase coverage by 0.06%.
The diff coverage is 57.14%.

❗ Current head eef9434 differs from pull request most recent head 3229a3a. Consider uploading reports for the commit 3229a3a to get more accurate results

@@               Coverage Diff                @@
##           branch-22.06   #10625      +/-   ##
================================================
+ Coverage         86.34%   86.41%   +0.06%     
================================================
  Files               142      142              
  Lines             22356    22334      -22     
================================================
- Hits              19304    19299       -5     
+ Misses             3052     3035      -17     
Impacted Files Coverage Δ
python/cudf/cudf/core/dataframe.py 93.75% <ø> (-0.01%) ⬇️
python/cudf/cudf/utils/cudautils.py 65.74% <ø> (+5.90%) ⬆️
python/cudf/cudf/core/series.py 95.15% <57.14%> (-0.13%) ⬇️
python/cudf/cudf/core/column/numerical.py 95.88% <0.00%> (-0.30%) ⬇️
python/cudf/cudf/core/groupby/groupby.py 91.72% <0.00%> (+0.22%) ⬆️
python/cudf/cudf/core/column/string.py 89.22% <0.00%> (+0.24%) ⬆️
python/cudf/cudf/core/tools/datetimes.py 84.49% <0.00%> (+0.30%) ⬆️
python/cudf/cudf/core/column/lists.py 92.79% <0.00%> (+1.27%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e668f2...3229a3a. Read the comment docs.

@beckernick
Copy link
Member

By ripping out the diff numba kernel and using shift + binary op, I believe this will provide null support. Perhaps worth reflecting including in the PR title for the changelog?

@Matt711 Matt711 changed the title Add support for non-numeric types in Series.diff and DataFrame.diff Add support for null and non-numeric types in Series.diff and DataFrame.diff Apr 11, 2022
@Matt711 Matt711 requested review from shwina and rgsl888prabhu April 14, 2022 19:28
@Matt711
Copy link
Contributor Author

Matt711 commented Apr 15, 2022

@gpucibot merge

1 similar comment
@bdice
Copy link
Contributor

bdice commented Apr 15, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 94a5d41 into rapidsai:branch-22.06 Apr 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Series.diff and DataFrame.diff should support non-numeric types (timestamps, durations)
7 participants