Fix comparison between Datetime/Timedelta columns and NULL scalars #7504

brandon-b-miller · 2021-03-03T16:36:57Z

galipremsagar · 2021-03-04T23:11:55Z

Looks good to me 👍

galipremsagar · 2021-03-04T23:12:01Z

rerun tests

rgsl888prabhu · 2021-03-05T06:35:33Z

python/cudf/cudf/tests/test_binops.py

+        "timedelta64[s]",
+    ],
+)
+@pytest.mark.parametrize("null_scalar", [None, cudf.NA])


Should we support pd.NA and pd.NaT ?

>>> s != pd.NA 0 True 1 True 2 True dtype: bool >>> s != pd.NaT 0 True 1 True 2 True dtype: bool

I think we can support NaT at least. What should the behavior be? Currently on branch-0.19, I get:

>>> pd.Series([1,2,3], dtype='datetime64[ns]') > np.datetime64('NaT') 0 False 1 False 2 False dtype: bool >>> cudf.Series([1,2,3], dtype='datetime64[ns]') > np.datetime64('NaT') 0 <NA> 1 <NA> 2 <NA> dtype: bool

Should we match pandas behavior here since NaT isn't the same as <NA>? cc @galipremsagar

Yup I think we can match pandas behavior because we already do this for NA in other columns:

>>> import cudf >>> s = cudf.Series([1, 2, 3]) >>> s > cudf.NA 0 False 1 False 2 False dtype: bool

But for nat, cudf can just continue to treat it as NA in these operations too as specified here: https://docs.rapids.ai/api/cudf/nightly/Working-with-missing-data.html#Datetimes

Ok - that makes sense. So we just consistently treat NaT as <NA> everywhere. My only suggestion is that we make the comparison also return <NA> since all comparisons against <NA> will return <NA> after #7066 is merged.

My only suggestion is that we make the comparison also return <NA> since all comparisons against <NA> will return <NA> after #7066 is merged.

Make sense +1 from my side.

codecov · 2021-03-16T05:33:38Z

Codecov Report

Merging #7504 (512425f) into branch-0.19 (7871e7a) will increase coverage by 0.64%.
The diff coverage is n/a.

❗ Current head 512425f differs from pull request most recent head 463ff63. Consider uploading reports for the commit 463ff63 to get more accurate results

@@               Coverage Diff               @@
##           branch-0.19    #7504      +/-   ##
===============================================
+ Coverage        81.86%   82.51%   +0.64%     
===============================================
  Files              101      101              
  Lines            16884    17446     +562     
===============================================
+ Hits             13822    14395     +573     
+ Misses            3062     3051      -11

Impacted Files	Coverage Δ
python/cudf/cudf/core/buffer.py	`84.21% <ø> (+4.96%)`	⬆️
python/cudf/cudf/core/column/categorical.py	`91.97% <ø> (+0.58%)`	⬆️
python/cudf/cudf/core/column/column.py	`87.61% <ø> (-0.15%)`	⬇️
python/cudf/cudf/core/column/datetime.py	`89.73% <ø> (+0.63%)`	⬆️
python/cudf/cudf/core/column/decimal.py	`92.75% <ø> (-2.12%)`	⬇️
python/cudf/cudf/core/column/lists.py	`90.00% <ø> (-1.40%)`	⬇️
python/cudf/cudf/core/column/numerical.py	`94.83% <ø> (-0.20%)`	⬇️
python/cudf/cudf/core/column/string.py	`86.79% <ø> (+0.30%)`	⬆️
python/cudf/cudf/core/column/timedelta.py	`88.66% <ø> (+0.42%)`	⬆️
python/cudf/cudf/core/column_accessor.py	`96.13% <ø> (+0.82%)`	⬆️
... and 56 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 561f68a...463ff63. Read the comment docs.

brandon-b-miller · 2021-03-17T19:45:17Z

@galipremsagar @rgsl888prabhu just a few minor changes in since the last review, if you want to give it a quick second check over.

brandon-b-miller · 2021-03-17T19:45:22Z

rerun tests

rgsl888prabhu

A small question, rest looks good.

rgsl888prabhu · 2021-03-18T07:57:50Z

python/cudf/cudf/tests/test_binops.py

+    if isinstance(null_scalar, np.datetime64):
+        if np.dtype(dtype).kind not in "mM":
+            pytest.skip()
+        else:


Think we don't need else, we can just leave null_scalar = null_scalar.astype(dtype) after pytest.skip()

kkraus14 · 2021-03-24T13:14:27Z

rerun tests

galipremsagar · 2021-03-24T20:59:47Z

rerun tests

kkraus14 · 2021-03-24T21:22:04Z

@gpucibot merge

fix and add test

d6b8f4e

brandon-b-miller added bug Something isn't working 3 - Ready for Review Ready for review by team Python Affects Python cuDF API. non-breaking Non-breaking change labels Mar 3, 2021

brandon-b-miller requested a review from a team as a code owner March 3, 2021 16:36

brandon-b-miller requested review from galipremsagar and rgsl888prabhu March 3, 2021 16:36

brandon-b-miller changed the title ~~fix and add test~~ Fix comparison between Datetime/Timedelta columns and NULL scalars Mar 3, 2021

galipremsagar approved these changes Mar 4, 2021

View reviewed changes

galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Mar 4, 2021

rgsl888prabhu reviewed Mar 5, 2021

View reviewed changes

brandon-b-miller added 4 commits March 15, 2021 18:32

merge 0.19

d0f034b

adjust for PR 7490

b80adae

style

46118fb

test for NaT

ce8c78f

rgsl888prabhu approved these changes Mar 18, 2021

View reviewed changes

adjust logic

463ff63

galipremsagar approved these changes Mar 24, 2021

View reviewed changes

rapids-bot bot merged commit 3136124 into rapidsai:branch-0.19 Mar 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix comparison between Datetime/Timedelta columns and NULL scalars #7504

Fix comparison between Datetime/Timedelta columns and NULL scalars #7504

brandon-b-miller commented Mar 3, 2021

galipremsagar commented Mar 4, 2021

galipremsagar commented Mar 4, 2021

rgsl888prabhu Mar 5, 2021

brandon-b-miller Mar 5, 2021

galipremsagar Mar 5, 2021

brandon-b-miller Mar 5, 2021

galipremsagar Mar 5, 2021

codecov bot commented Mar 16, 2021 •

edited

Loading

brandon-b-miller commented Mar 17, 2021

brandon-b-miller commented Mar 17, 2021

rgsl888prabhu left a comment

rgsl888prabhu Mar 18, 2021

brandon-b-miller Mar 23, 2021

kkraus14 commented Mar 24, 2021

galipremsagar commented Mar 24, 2021

kkraus14 commented Mar 24, 2021

Fix comparison between Datetime/Timedelta columns and NULL scalars #7504

Fix comparison between Datetime/Timedelta columns and NULL scalars #7504

Conversation

brandon-b-miller commented Mar 3, 2021

galipremsagar commented Mar 4, 2021

galipremsagar commented Mar 4, 2021

rgsl888prabhu Mar 5, 2021

Choose a reason for hiding this comment

brandon-b-miller Mar 5, 2021

Choose a reason for hiding this comment

galipremsagar Mar 5, 2021

Choose a reason for hiding this comment

brandon-b-miller Mar 5, 2021

Choose a reason for hiding this comment

galipremsagar Mar 5, 2021

Choose a reason for hiding this comment

codecov bot commented Mar 16, 2021 • edited Loading

Codecov Report

brandon-b-miller commented Mar 17, 2021

brandon-b-miller commented Mar 17, 2021

rgsl888prabhu left a comment

Choose a reason for hiding this comment

rgsl888prabhu Mar 18, 2021

Choose a reason for hiding this comment

brandon-b-miller Mar 23, 2021

Choose a reason for hiding this comment

kkraus14 commented Mar 24, 2021

galipremsagar commented Mar 24, 2021

kkraus14 commented Mar 24, 2021

codecov bot commented Mar 16, 2021 •

edited

Loading