-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix DTI comparison with None, datetime.date #19301
Conversation
datetime(2016, 1, 1).date()]) | ||
def test_dti_cmp_invalid(self, tz, other): | ||
# GH#19301 | ||
dti = pd.date_range('2016-01-01', periods=2, tz=tz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is this saying that
DatetimeIndex([None, '2016-01-01']) == [None, datetime.date(2016, 1, 1)])
is [False, False
]? I thought that in #18188 we decided that DatetimeIndex
compared with datetime.date
would coerce the date
to a datetime
at midnight?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger I think part of the confusion is over Timestamp
comparison vs DatetimeIndex
comparison vs Series[datetime64]
comparison (e.g. the whatsnew note in #18188 talks about Series
(but puts the tests in index tests...)). This PR and a bunch of other recent ones have been focused on making the Index
/Series
behavior more consistent.
Following this, #19288 can be de-kludged to make Series[datetime64]
comparison dispatch to DatetimeIndex
comparison, ensuring internal consistency. But you're right that this would mean a change in the behavior of Series[datetime64]
comparisons.
For the moment I'm taking Timestamp
behavior as canonical and making DatetimeIndex
match that.
ts = pd.Timestamp('2016-01-01')
>>> ts == ts.to_pydatetime().date()
False
>>> ts < ts.to_pydatetime().date()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pandas/_libs/tslib.pyx", line 1165, in pandas._libs.tslib._Timestamp.__richcmp__
TypeError: Cannot compare type 'Timestamp' with type 'date'
I recall a discussion of whether date
should be treated as datetime-at-midnight for Timestamp
comparison purposes, my thought being it should be treated as Period(..., freq='D')
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In [1]: ts = pd.Timestamp('2016-01-01')
In [2]: ts
Out[2]: Timestamp('2016-01-01 00:00:00')
In [3]: ts.date()
Out[3]: datetime.date(2016, 1, 1)
In [4]: ts.to_pydatetime()
Out[4]: datetime.datetime(2016, 1, 1, 0, 0)
In [5]: ts.to_pydatetime() == ts.date()
Out[5]: False
In [6]: ts.to_pydatetime().date() == ts.date()
Out[6]: True
I find [5] a bit odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find [5] a bit odd.
The behavior is analogous to comparing Timestamp
vs Period
. eq and ne return False and True, respectively, and all others raise TypeError
. Its odd if you interpret date as "datetime implicitly at midnight", but pretty intuitive if you interpret it as "less specific than a timestamp"
@@ -41,6 +41,25 @@ def addend(request): | |||
return request.param | |||
|
|||
|
|||
class TestDatetimeIndexComparisons(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you lump the None with the NaT comparisons. Do we have the same testing of these comparisons for scalars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT the comparison tests are scattered. I had planned to consolidate them here in a follow-up to keep narrow focus here.
can you lump the None with the NaT comparisons
They would need to be separate tests since the behavior is different, but I can put the tests adjacent to each other in this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Centralizing the DTI comparison tests is now a part of #19317. After that I'll make a pass to make sure all the relevant cases are covered.
Looks like Travis cancellation. |
Codecov Report
@@ Coverage Diff @@
## master #19301 +/- ##
=======================================
Coverage 91.67% 91.67%
=======================================
Files 148 148
Lines 48553 48553
=======================================
Hits 44513 44513
Misses 4040 4040
Continue to review full report at Codecov.
|
rebase |
Index, ABCSeries)): | ||
# Following Timestamp convention, __eq__ is all-False | ||
# and __ne__ is all True, others raise TypeError. | ||
if opname == '__eq__': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer to return in the if/elif, then just raise instead of an else
dti > other | ||
with pytest.raises(TypeError): | ||
dti >= other | ||
|
||
# TODO: De-duplicate with test_comparisons_nat below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason NOT to de-dupe with the other comparisons in this file (e.g. the TODO comment), in this PR? This is adding code then then will be inserted into the parameterization, I would rather just do it right once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment was added in #19317, which was largely intended to be cut/paste with edits done in a separate PR. I can go and add this de-duplication to this PR, but that would be expanding the scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's do it. right now these are a bit orphaned and should be combined with existing tests. in general this is a good thing to do anyhow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Turned out they weren't duplicated so much as poorly-named.
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -434,6 +434,7 @@ Conversion | |||
- Bug in ``.astype()`` to non-ns timedelta units would hold the incorrect dtype (:issue:`19176`, :issue:`19223`, :issue:`12425`) | |||
- Bug in subtracting :class:`Series` from ``NaT`` incorrectly returning ``NaT`` (:issue:`19158`) | |||
- Bug in comparison of timezone-aware :class:`DatetimeIndex` against ``NaT`` incorrectly raising ``TypeError`` (:issue:`19276`) | |||
- Bug in comparison of :class:`DatetimeIndex` against ``None`` or ``datetime.date`` objects raising ``TypeError`` for ``==`` and ``!=`` comparisons instead of all-``False`` and all-``True``, respectively (:issue:`19301`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebase again, I just updated
with pytest.raises(TypeError): | ||
dti < other | ||
with pytest.raises(TypeError): | ||
dti <= other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should also test pd.NaT
here (and I think we have a tests for np.nan
that should test raising).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just added nan to the params for this test. pd.NaT test is immediately below this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the datetime.date part of this needs to move to where we test comparisions with Timestamp, datetime.datetime and np.datetime64
@pytest.mark.parametrize('other', [None, | ||
datetime(2016, 1, 1).date(), | ||
np.nan]) | ||
def test_dti_cmp_invalid(self, tz, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I think leave None and np.nan here is ok (rename this from invalid -> null or something more descriptive).
put the date with a Timestamp / datetime test.
11 hours and the OSX build hasn't started. Possible problem with Travis? |
see that red X on the status (on the travis page). They give messages. |
with pytest.raises(TypeError): | ||
dti < other | ||
with pytest.raises(TypeError): | ||
dti <= other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the datetime.date part of this needs to move to where we test comparisions with Timestamp, datetime.datetime and np.datetime64
OK, but it'll be a test identical the the one here, just in a different place. |
my point is there already are tests for this - need to collocate them |
OK. There are no such tests in this file; possible in scalar. My first choice is to not move anything in this PR, but if we have to either move DTI-with-timestamp-comparison test to tests.scalar or timestamp-with-DTI-comparison test to tests.indexes, I'd prefer the latter. |
Not seeing it in scalar either. I guess I'll add some extras here. |
dti >= other | ||
|
||
@pytest.mark.parametrize('other', [None, | ||
np.nan]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add pd.NaT
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, pd.NaT behaves differently from None or np.nan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh? that would be troubling, why is that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The __eq__
and __ne__
behave the same, but None and np.nan raise TypeError
for the inequalities, whereas pd.NaT returns False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. ok then split the tests to reflect this, testing all nulls at once for eq/ne make the test easy to grok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make the test easy to grok.
I'll do this, but object to the notions that fixtures and easy-to-grok can go in the same file. If I can't import the test and run it in the interpreter, then I'm in a Strange Land.
@@ -44,7 +45,74 @@ def addend(request): | |||
|
|||
|
|||
class TestDatetimeIndexComparisons(object): | |||
# TODO: De-duplicate with test_comparisons_nat below | |||
@pytest.mark.parametrize('other', [datetime(2016, 1, 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move all of these comparisons (that you are moving anyway) to a new test_compare.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? The other test_arithmetic.py files we've refactored out recently mostly include a TestComparison class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to separate them, a followup is ok.
|
||
@pytest.mark.parametrize('other', [None, | ||
np.nan]) | ||
def test_dti_cmp_non_datetime(self, tz, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dti_cmp_null_scalar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks ok. pls rebase
@@ -44,7 +45,74 @@ def addend(request): | |||
|
|||
|
|||
class TestDatetimeIndexComparisons(object): | |||
# TODO: De-duplicate with test_comparisons_nat below | |||
@pytest.mark.parametrize('other', [datetime(2016, 1, 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to separate them, a followup is ok.
Thoughts here? This is now a blocker for the next steps in core.ops. |
this looks good. rebase and ping on green. |
ping |
thanks! @jbrockmendel just wanted to say. very much appreciate your changes. As they get more complicated and/or hit edge cases, I am necessarily being more of a stickler on things. don't take it personally (and you are very responsive!) pandas is quite complicated and enforcing consistency across user AND developer experiences is hard, but very important. |
Discussed in #19288
git diff upstream/master -u -- "*.py" | flake8 --diff