Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix comparison between Datetime/Timedelta columns and NULL scalars #7504

Merged

Conversation

brandon-b-miller
Copy link
Contributor

Fixes #6897

@brandon-b-miller brandon-b-miller added bug Something isn't working 3 - Ready for Review Ready for review by team Python Affects Python cuDF API. non-breaking Non-breaking change labels Mar 3, 2021
@brandon-b-miller brandon-b-miller requested a review from a team as a code owner March 3, 2021 16:36
@brandon-b-miller brandon-b-miller changed the title fix and add test Fix comparison between Datetime/Timedelta columns and NULL scalars Mar 3, 2021
@galipremsagar
Copy link
Contributor

Looks good to me 👍

@galipremsagar
Copy link
Contributor

rerun tests

@galipremsagar galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Mar 4, 2021
"timedelta64[s]",
],
)
@pytest.mark.parametrize("null_scalar", [None, cudf.NA])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we support pd.NA and pd.NaT ?

>>> s != pd.NA
0    True
1    True
2    True
dtype: bool
>>> s != pd.NaT
0    True
1    True
2    True
dtype: bool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can support NaT at least. What should the behavior be? Currently on branch-0.19, I get:

>>> pd.Series([1,2,3], dtype='datetime64[ns]') > np.datetime64('NaT')
0    False
1    False
2    False
dtype: bool
>>> cudf.Series([1,2,3], dtype='datetime64[ns]') > np.datetime64('NaT')
0    <NA>
1    <NA>
2    <NA>
dtype: bool

Should we match pandas behavior here since NaT isn't the same as <NA>? cc @galipremsagar

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup I think we can match pandas behavior because we already do this for NA in other columns:

>>> import cudf
>>> s = cudf.Series([1, 2, 3])
>>> s > cudf.NA
0    False
1    False
2    False
dtype: bool

But for nat, cudf can just continue to treat it as NA in these operations too as specified here: https://docs.rapids.ai/api/cudf/nightly/Working-with-missing-data.html#Datetimes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - that makes sense. So we just consistently treat NaT as <NA> everywhere. My only suggestion is that we make the comparison also return <NA> since all comparisons against <NA> will return <NA> after #7066 is merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only suggestion is that we make the comparison also return <NA> since all comparisons against <NA> will return <NA> after #7066 is merged.

Make sense +1 from my side.

@codecov
Copy link

codecov bot commented Mar 16, 2021

Codecov Report

Merging #7504 (512425f) into branch-0.19 (7871e7a) will increase coverage by 0.64%.
The diff coverage is n/a.

❗ Current head 512425f differs from pull request most recent head 463ff63. Consider uploading reports for the commit 463ff63 to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.19    #7504      +/-   ##
===============================================
+ Coverage        81.86%   82.51%   +0.64%     
===============================================
  Files              101      101              
  Lines            16884    17446     +562     
===============================================
+ Hits             13822    14395     +573     
+ Misses            3062     3051      -11     
Impacted Files Coverage Δ
python/cudf/cudf/core/buffer.py 84.21% <ø> (+4.96%) ⬆️
python/cudf/cudf/core/column/categorical.py 91.97% <ø> (+0.58%) ⬆️
python/cudf/cudf/core/column/column.py 87.61% <ø> (-0.15%) ⬇️
python/cudf/cudf/core/column/datetime.py 89.73% <ø> (+0.63%) ⬆️
python/cudf/cudf/core/column/decimal.py 92.75% <ø> (-2.12%) ⬇️
python/cudf/cudf/core/column/lists.py 90.00% <ø> (-1.40%) ⬇️
python/cudf/cudf/core/column/numerical.py 94.83% <ø> (-0.20%) ⬇️
python/cudf/cudf/core/column/string.py 86.79% <ø> (+0.30%) ⬆️
python/cudf/cudf/core/column/timedelta.py 88.66% <ø> (+0.42%) ⬆️
python/cudf/cudf/core/column_accessor.py 96.13% <ø> (+0.82%) ⬆️
... and 56 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 561f68a...463ff63. Read the comment docs.

@brandon-b-miller
Copy link
Contributor Author

@galipremsagar @rgsl888prabhu just a few minor changes in since the last review, if you want to give it a quick second check over.

@brandon-b-miller
Copy link
Contributor Author

rerun tests

Copy link
Contributor

@rgsl888prabhu rgsl888prabhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small question, rest looks good.

if isinstance(null_scalar, np.datetime64):
if np.dtype(dtype).kind not in "mM":
pytest.skip()
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we don't need else, we can just leave null_scalar = null_scalar.astype(dtype) after pytest.skip()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@kkraus14
Copy link
Collaborator

rerun tests

@galipremsagar
Copy link
Contributor

rerun tests

@kkraus14
Copy link
Collaborator

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 3136124 into rapidsai:branch-0.19 Mar 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Comparison b/w None and DateTime column Fails
4 participants