datetime.date no longer coerced to datetime64 for comparison operations #21152

innominate227 · 2018-05-21T15:47:30Z

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
import datetime

pydate = datetime.date(2018,1,1)
df = pd.DataFrame({'date':[np.datetime64('2018-01-01')]})

sel = df[df['date'] == pydate]
print(len(sel))

Problem description

In pandas 0.22 the code above printed "1", in 0.23 it prints "0".

Expected Output

1

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 45 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.0
pytest: 2.9.2
pip: 8.1.2
setuptools: 39.0.1
Cython: 0.24.1
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 5.1.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.3.2
xlrd: 1.1.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: 0.9999999
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: 0.1.2
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2018-05-21T15:55:48Z

cc @jbrockmendel intentional, or regression?

TomAugspurger · 2018-05-21T15:58:09Z

Interesting, Python considers them different:

In [25]: pydate == datetime.datetime(2018, 1, 1)
Out[25]: False

TomAugspurger · 2018-05-21T16:06:32Z

Hmm, #18188 contained a test for this, but it was removed in 87fefe2#diff-790bb1655c69fe970376e710435edb80R352

relevant bit is https://github.com/pandas-dev/pandas/pull/19800/files#r169681238

So, I'm saying this was an intentional change to make Series[datetime64] behave the same as DatetimeIndex here. But we forgot to update the release notes in #19800

Does that sound right?

jbrockmendel · 2018-05-21T16:49:51Z

intentional, or regression?

Intentional. The Series/DataFrame behavior was changed to match the DatetimeIndex behavior, which is based on the Timestamp/datetime behavior. If a release note was missed that was likely my mistake.

changhsinlee · 2018-05-22T14:42:09Z

I had the same problem. Please add this change to the release note, as it breaks a lot of code :( ... Is it possible to add this comparison back in the next release?

TomAugspurger · 2018-05-22T14:57:06Z

@changhsinlee a PR adding it to the changelog would be welcome. We can rebuild and re-upload the docs.

Not sure about reverting the change. The fact that we're now consistent

with NumPy and Python
between DatetimeIndex and Series

is attractive

In [1]: import datetime

In [2]: import pandas as pd
im
In [3]: import numpy as np

In [4]: datetime.date(2017, 1, 1) == datetime.datetime(2017, 1, 1)
Out[4]: False

In [5]: np.array(['2017-01-01'], dtype='M8[s]') == datetime.date(2017, 1, 1)
Out[5]: array([False])

In [6]: pd.Series(pd.to_datetime(['2017-01-01'])) == datetime.date(2017, 1, 1)
Out[6]:
0    False
dtype: bool

In [7]: pd.to_datetime(['2017-01-01']) == datetime.date(2017, 1, 1)
Out[7]: array([False])

Previously Out[6] was True and In[7] raised. I'm not sure how much effort it would be to implement a proper deprecation here, though I suspect it will be tricky.

as it breaks a lot of code

What sorts of workloads does this break? Are you able to wrap your datetime.dates in a pd.Timestamp before comparing?

innominate227 · 2018-05-22T15:08:03Z

Most frustrating to me is how [6] silently became false. Its going to be hard to find all the places in our codebase where comparisons like this were done. Does it make since for [6] to write some warning?

TomAugspurger · 2018-05-22T15:13:17Z

Ideally yes that would have warned for 0.23. We would need to decide whether to change it *back* to True for a 0.23.1 and emit a warning. I'm not sure whether that's a good idea or not. I'm also not difficult it will be to implement that warning, though some of the linked issues should give you an idea.

…

On Tue, May 22, 2018 at 10:08 AM, innominate227 ***@***.***> wrote: Most frustrating to me is how [6] silently became false, going to be hard to find all the places where comparisons like this were done. Does it make since for [6] to write some warning? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#21152 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIk7VsIwpsBSWuCBp2BZ3J4JhPMAQks5t1CnZgaJpZM4UHM-o> .

jorisvandenbossche · 2018-05-23T07:52:37Z

So to recap. Behaviour on 0.22:

In [1]: dtidx = pd.date_range("2012-01-01", periods=3)

In [3]: import datetime

In [4]: pydate = datetime.date(2012, 1, 1)

In [5]: dtidx == pydate
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
...
TypeError: <class 'datetime.date'> type object 2012-01-01

In [6]: pd.Series(dtidx) == pydate
Out[6]: 
0     True
1    False
2    False
dtype: bool

Behaviour on 0.23:

In [21]: dtidx == pydate
Out[21]: array([False, False, False], dtype=bool)

In [22]: pd.Series(dtidx) == pydate
Out[22]: 
0    False
1    False
2    False
dtype: bool

I personally think silently changing True to False was a bad idea of us (if it started raising an error as in 0.22 dtidx, it would at least not give silent changes in behaviour). This is a serious regression IMO.

So personally I would revert this change. And then we can decide to keep it (and eg change DatetimeIndex to be consistent with Series behaviour), or raise a deprecation warning we will change this in the future.

jorisvandenbossche · 2018-05-23T09:21:21Z

On which behaviour we would ideally like to have (coerce or not):

Not sure about reverting the change. The fact that we're now consistent

with NumPy and Python

between DatetimeIndex and Series

is attractive

That is true, but on the other hand, in pandas we generally do much more coercing to the type it is compared with. For example, we also coerce strings (using the same example from above):

In [23]: pd.Series(dtidx) == "2012-01-01"
Out[23]: 
0     True
1    False
2    False
dtype: bool

(which also works for DatetimeIndex)

jreback · 2018-05-23T10:25:44Z

This is a good deliberate change. datetime.date is not a first class pandas type. Folks have been using at risk for years. We fully match numpy / python semantics. Changing to broken semantics is unreasonable. -1 on any further change here.

jreback · 2018-05-23T10:26:53Z

@jorisvandenbossche your examples [23] is flawed. String are coerced to partial string indexing which gives much flexibility. datetime.data / datetime.date / Timestamps are never coerced in comparisons and are exact matches.

TomAugspurger · 2018-05-23T11:14:16Z

Folks have been using at risk for years.

"At risk" implies that people know they were doing something "wrong" :)

w.r.t. Out[23], I don't see how partial string indexing comes in. With partial string indexing, I would expect the following to be [True, True, False].

In [3]: pd.Series(pd.to_datetime(['2017-01-01', '2017-01-02', '2018'])) == '2017'
Out[3]:
0     True
1    False
2    False
dtype: bool

But anyway, I'm not sure coercing the string makes sense here, so I'm not going to argue for that :)

jorisvandenbossche · 2018-05-23T11:39:07Z

Yes, IMO you could also say that folks have been using this because it was well established behaviour of pandas, no matter how python or numpy did it. And we silently changed the result of their calculation.

And wanted to give the same example as Tom, that I don't think this the example I gave was wrong or flawed. The string comparison is also an exact match.
That is not necessarily an argument for allowing matches with datetime.date, but it is at least a precedent that shows we coerce "datetime-like values".

jbrockmendel · 2018-05-23T18:22:23Z

It looks like there are 3 issues here:

A missing release note, almost certainly my mistake
A change in how datetime.date is treated -- the new behavior is correct and should not be reverted
A discussion of how string comparisons are treated, which can be a minefield of its own (e.g. DatetimeIndex lookups tz inconsistency #18435)

Unless I'm mistaken about 3) being tangential to the OP, I think it confuses more than it clarifies.

TomAugspurger · 2018-05-23T18:25:43Z

I'm not sure Joris agrees with 2. I think I'm OK with the change. I would add a 2a): *How* the change was made. Should we have done a deprecation cycle? (yes) And is it worth reinstating with a deprecation? (not sure).

…

On Wed, May 23, 2018 at 1:22 PM, jbrockmendel ***@***.***> wrote: It looks like there are 3 issues here: 1. A missing release note, almost certainly my mistake 2. A change in how datetime.date is treated -- the new behavior is correct and should *not* be reverted 3. A discussion of how string comparisons are treated, which can be a minefield of its own (e.g. #18435 <#18435>) Unless I'm mistaken about 3) being tangential to the OP, I think it confuses more than it clarifies. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#21152 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIpERLkthqo4LdxBZbH-8pngoj11bks5t1ajngaJpZM4UHM-o> .

jorisvandenbossche · 2018-05-23T21:54:15Z

Indeed, I don't agree with how you state 2, but I don't necessarily disagree with the change itself.

Coercing a datetime.date to an exact timestamp to do comparison ops is not "wrong" or "correct", it is a design choice. We can discuss which we think is most consistent, or will lead to the least confusion, is the clearest, ... But in the end: in the past we made the choice to explicitly allow this for Series, and given the inconsistency with DatetimeIndex, for 0.23 we made the choice to change this behaviour.

I am not fully sure this was the best choice (we could also have opted for changing the DatetimeIndex behaviour), but I don't really object the choice (since there is not really a good reason people would want to do this, if they can also use datetime).

I would add a 2a): How the change was made. Should we have done a deprecation cycle? (yes) And is it worth reinstating with a deprecation? (not sure).

Yep, I agree we should ideally have done this with a deprecation cycle. And I personally think it is still worth to do it for 0.23.1

jbrockmendel · 2018-05-24T02:35:27Z

Indeed, I don't agree with how you state 2, but I don't necessarily disagree with the change itself.

I evidently communicated poorly. The bit before the "--" was intended as the issue description and the bit after the "--" my opinion on it.

we could also have opted for changing the DatetimeIndex behaviour

There was some discussion about this. The general thrust of it is that DatetimeIndex ops should behave like vectorized versions of Timestamp ops. Timestamp follows the convention set by datetime when it comes to comparisons with date. The analogy I think of is datetime:date :: Timestamp:Period[D]

jorisvandenbossche · 2018-05-24T14:53:45Z

The general thrust of it is that DatetimeIndex ops should behave like vectorized versions of Timestamp ops

OK, but we don't really follow that if it comes to strings anyway ... (as that does not work on Timestamp, but does work for DatetimeIndex or Series).
Which doesn't mean this discrepancy is a problem. For me it is fine to not support datetime.date, but support strings (and maybe we should add it to Timestamp for consistency?). I just think users could not have guessed this that it was "wrong".

jbrockmendel · 2018-05-24T18:31:59Z

OK, but we don't really follow that if it comes to strings anyway

This came up in #18435 and the conclusion IIRC is that the string special-casing was very specifically a convenience feature for indexing purposes.

TomAugspurger · 2018-05-25T13:10:47Z

I think we'll keep the new behavior and issue a warning that users probably don't want to compare to a datetime.date, since it's always false.

h-vetinari · 2018-06-05T13:54:43Z

It's not just about equality - arguably, datetime >= date makes sense in all scenarios, and this change broke my code.

But even if consistency is ultimately decided to be the higher good, at least the error warning should be better in one of the following cases -- i.e. not TypeError: <class 'datetime.date'> type object 2013-01-01 but consistent with TypeError: can't compare datetime.datetime to datetime.date.

import pandas as pd
from datetime import date, datetime

v.0.22:

pd.Series([date(2014, 1, 1), date(2013, 1, 1)]) >= date(2013, 1, 1)
# 0    True
# 1    True
# dtype: bool
pd.Series([date(2014, 1, 1), datetime(2013, 1, 1)]) >= date(2013, 1, 1)
# 0    True
# 1    True
# dtype: bool
pd.Series([date(2014, 1, 1), date(2013, 1, 1)]) >= datetime(2013, 1, 1)
# TypeError: can't compare datetime.datetime to datetime.date
pd.Series([date(2014, 1, 1), datetime(2013, 1, 1)]) >= datetime(2013, 1, 1)
# 0    True
# 1    True
# dtype: bool

v.0.23:

pd.Series([date(2014, 1, 1), date(2013, 1, 1)]) >= date(2013, 1, 1)
# 0    True
# 1    True
# dtype: bool
pd.Series([date(2014, 1, 1), datetime(2013, 1, 1)]) >= date(2013, 1, 1)
# TypeError: <class 'datetime.date'> type object 2013-01-01
pd.Series([date(2014, 1, 1), date(2013, 1, 1)]) >= datetime(2013, 1, 1)
# TypeError: can't compare datetime.datetime to datetime.date
pd.Series([date(2014, 1, 1), datetime(2013, 1, 1)]) >= datetime(2013, 1, 1)
# 0    True
# 1    True
# dtype: bool

TomAugspurger · 2018-06-07T13:10:49Z

I think we agreed this was a blocker for 0.23.1.

Will try to get to it today.

TomAugspurger changed the title ~~date comparison change in 0.23~~ datetime.date no longer coerced to datetime64 for comparison operations May 21, 2018

TomAugspurger added Datetime Datetime data dtype Numeric Operations Arithmetic, Comparison, and Logical operations labels May 21, 2018

jorisvandenbossche added this to the 0.23.1 milestone May 23, 2018

jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label May 23, 2018

jreback modified the milestones: 0.23.1, 0.23.2 Jun 7, 2018

TomAugspurger removed this from the 0.23.2 milestone Jun 7, 2018

TomAugspurger added this to the 0.23.1 milestone Jun 7, 2018

TomAugspurger added the Blocker Blocking issue or pull request for an upcoming release label Jun 7, 2018

This was referenced Jun 7, 2018

Revert changes to comparison between Series[datetime64ns] and datetime.date #21359

Closed

Revert change to comparison op with datetime.date objects #21361

Merged

jorisvandenbossche closed this as completed in #21361 Jun 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datetime.date no longer coerced to datetime64 for comparison operations #21152

datetime.date no longer coerced to datetime64 for comparison operations #21152

innominate227 commented May 21, 2018

INSTALLED VERSIONS

TomAugspurger commented May 21, 2018

TomAugspurger commented May 21, 2018

TomAugspurger commented May 21, 2018 •

edited

Loading

jbrockmendel commented May 21, 2018

changhsinlee commented May 22, 2018

TomAugspurger commented May 22, 2018

innominate227 commented May 22, 2018 •

edited

Loading

TomAugspurger commented May 22, 2018 via email

jorisvandenbossche commented May 23, 2018

jorisvandenbossche commented May 23, 2018

jreback commented May 23, 2018

jreback commented May 23, 2018

TomAugspurger commented May 23, 2018

jorisvandenbossche commented May 23, 2018

jbrockmendel commented May 23, 2018

TomAugspurger commented May 23, 2018 via email

jorisvandenbossche commented May 23, 2018

jbrockmendel commented May 24, 2018

jorisvandenbossche commented May 24, 2018 •

edited

Loading

jbrockmendel commented May 24, 2018

TomAugspurger commented May 25, 2018

h-vetinari commented Jun 5, 2018

TomAugspurger commented Jun 7, 2018

datetime.date no longer coerced to datetime64 for comparison operations #21152

datetime.date no longer coerced to datetime64 for comparison operations #21152

Comments

innominate227 commented May 21, 2018

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

TomAugspurger commented May 21, 2018

TomAugspurger commented May 21, 2018

TomAugspurger commented May 21, 2018 • edited Loading

jbrockmendel commented May 21, 2018

changhsinlee commented May 22, 2018

TomAugspurger commented May 22, 2018

innominate227 commented May 22, 2018 • edited Loading

TomAugspurger commented May 22, 2018 via email

jorisvandenbossche commented May 23, 2018

jorisvandenbossche commented May 23, 2018

jreback commented May 23, 2018

jreback commented May 23, 2018

TomAugspurger commented May 23, 2018

jorisvandenbossche commented May 23, 2018

jbrockmendel commented May 23, 2018

TomAugspurger commented May 23, 2018 via email

jorisvandenbossche commented May 23, 2018

jbrockmendel commented May 24, 2018

jorisvandenbossche commented May 24, 2018 • edited Loading

jbrockmendel commented May 24, 2018

TomAugspurger commented May 25, 2018

h-vetinari commented Jun 5, 2018

TomAugspurger commented Jun 7, 2018

Output of `pd.show_versions()`

TomAugspurger commented May 21, 2018 •

edited

Loading

innominate227 commented May 22, 2018 •

edited

Loading

jorisvandenbossche commented May 24, 2018 •

edited

Loading