-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix comparison between Datetime/Timedelta columns and NULL scalars #7504
Fix comparison between Datetime/Timedelta columns and NULL scalars #7504
Conversation
Looks good to me 👍 |
rerun tests |
"timedelta64[s]", | ||
], | ||
) | ||
@pytest.mark.parametrize("null_scalar", [None, cudf.NA]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we support pd.NA and pd.NaT ?
>>> s != pd.NA
0 True
1 True
2 True
dtype: bool
>>> s != pd.NaT
0 True
1 True
2 True
dtype: bool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can support NaT
at least. What should the behavior be? Currently on branch-0.19, I get:
>>> pd.Series([1,2,3], dtype='datetime64[ns]') > np.datetime64('NaT')
0 False
1 False
2 False
dtype: bool
>>> cudf.Series([1,2,3], dtype='datetime64[ns]') > np.datetime64('NaT')
0 <NA>
1 <NA>
2 <NA>
dtype: bool
Should we match pandas behavior here since NaT
isn't the same as <NA>
? cc @galipremsagar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup I think we can match pandas behavior because we already do this for NA
in other columns:
>>> import cudf
>>> s = cudf.Series([1, 2, 3])
>>> s > cudf.NA
0 False
1 False
2 False
dtype: bool
But for nat
, cudf can just continue to treat it as NA
in these operations too as specified here: https://docs.rapids.ai/api/cudf/nightly/Working-with-missing-data.html#Datetimes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - that makes sense. So we just consistently treat NaT
as <NA>
everywhere. My only suggestion is that we make the comparison also return <NA>
since all comparisons against <NA>
will return <NA>
after #7066 is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only suggestion is that we make the comparison also return
<NA>
since all comparisons against<NA>
will return<NA>
after #7066 is merged.
Make sense +1 from my side.
Codecov Report
@@ Coverage Diff @@
## branch-0.19 #7504 +/- ##
===============================================
+ Coverage 81.86% 82.51% +0.64%
===============================================
Files 101 101
Lines 16884 17446 +562
===============================================
+ Hits 13822 14395 +573
+ Misses 3062 3051 -11
Continue to review full report at Codecov.
|
@galipremsagar @rgsl888prabhu just a few minor changes in since the last review, if you want to give it a quick second check over. |
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small question, rest looks good.
if isinstance(null_scalar, np.datetime64): | ||
if np.dtype(dtype).kind not in "mM": | ||
pytest.skip() | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think we don't need else
, we can just leave null_scalar = null_scalar.astype(dtype)
after pytest.skip()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
rerun tests |
rerun tests |
@gpucibot merge |
Fixes #6897