-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datetime.date no longer coerced to datetime64 for comparison operations #21152
Comments
cc @jbrockmendel intentional, or regression? |
Interesting, Python considers them different: In [25]: pydate == datetime.datetime(2018, 1, 1)
Out[25]: False |
Hmm, #18188 contained a test for this, but it was removed in 87fefe2#diff-790bb1655c69fe970376e710435edb80R352 relevant bit is https://github.com/pandas-dev/pandas/pull/19800/files#r169681238 So, I'm saying this was an intentional change to make Series[datetime64] behave the same as DatetimeIndex here. But we forgot to update the release notes in #19800 Does that sound right? |
Intentional. The Series/DataFrame behavior was changed to match the DatetimeIndex behavior, which is based on the Timestamp/datetime behavior. If a release note was missed that was likely my mistake. |
I had the same problem. Please add this change to the release note, as it breaks a lot of code :( ... Is it possible to add this comparison back in the next release? |
@changhsinlee a PR adding it to the changelog would be welcome. We can rebuild and re-upload the docs. Not sure about reverting the change. The fact that we're now consistent
is attractive In [1]: import datetime
In [2]: import pandas as pd
im
In [3]: import numpy as np
In [4]: datetime.date(2017, 1, 1) == datetime.datetime(2017, 1, 1)
Out[4]: False
In [5]: np.array(['2017-01-01'], dtype='M8[s]') == datetime.date(2017, 1, 1)
Out[5]: array([False])
In [6]: pd.Series(pd.to_datetime(['2017-01-01'])) == datetime.date(2017, 1, 1)
Out[6]:
0 False
dtype: bool
In [7]: pd.to_datetime(['2017-01-01']) == datetime.date(2017, 1, 1)
Out[7]: array([False]) Previously
What sorts of workloads does this break? Are you able to wrap your |
Most frustrating to me is how [6] silently became false. Its going to be hard to find all the places in our codebase where comparisons like this were done. Does it make since for [6] to write some warning? |
Ideally yes that would have warned for 0.23.
We would need to decide whether to change it *back* to True for a 0.23.1
and emit a warning. I'm not sure whether
that's a good idea or not. I'm also not difficult it will be to implement
that warning, though some of the linked issues should give you an idea.
…On Tue, May 22, 2018 at 10:08 AM, innominate227 ***@***.***> wrote:
Most frustrating to me is how [6] silently became false, going to be hard
to find all the places where comparisons like this were done. Does it make
since for [6] to write some warning?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#21152 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIk7VsIwpsBSWuCBp2BZ3J4JhPMAQks5t1CnZgaJpZM4UHM-o>
.
|
So to recap. Behaviour on 0.22:
Behaviour on 0.23:
I personally think silently changing True to False was a bad idea of us (if it started raising an error as in 0.22 dtidx, it would at least not give silent changes in behaviour). This is a serious regression IMO. So personally I would revert this change. And then we can decide to keep it (and eg change DatetimeIndex to be consistent with Series behaviour), or raise a deprecation warning we will change this in the future. |
On which behaviour we would ideally like to have (coerce or not):
That is true, but on the other hand, in pandas we generally do much more coercing to the type it is compared with. For example, we also coerce strings (using the same example from above):
(which also works for DatetimeIndex) |
This is a good deliberate change. |
@jorisvandenbossche your examples [23] is flawed. String are coerced to partial string indexing which gives much flexibility. datetime.data / datetime.date / Timestamps are never coerced in comparisons and are exact matches. |
"At risk" implies that people know they were doing something "wrong" :) w.r.t. In [3]: pd.Series(pd.to_datetime(['2017-01-01', '2017-01-02', '2018'])) == '2017'
Out[3]:
0 True
1 False
2 False
dtype: bool But anyway, I'm not sure coercing the string makes sense here, so I'm not going to argue for that :) |
Yes, IMO you could also say that folks have been using this because it was well established behaviour of pandas, no matter how python or numpy did it. And we silently changed the result of their calculation. And wanted to give the same example as Tom, that I don't think this the example I gave was wrong or flawed. The string comparison is also an exact match. |
It looks like there are 3 issues here:
Unless I'm mistaken about 3) being tangential to the OP, I think it confuses more than it clarifies. |
I'm not sure Joris agrees with 2. I think I'm OK with the change.
I would add a 2a): *How* the change was made. Should we have done a
deprecation cycle? (yes) And is it worth reinstating with a deprecation?
(not sure).
…On Wed, May 23, 2018 at 1:22 PM, jbrockmendel ***@***.***> wrote:
It looks like there are 3 issues here:
1. A missing release note, almost certainly my mistake
2. A change in how datetime.date is treated -- the new behavior is
correct and should *not* be reverted
3. A discussion of how string comparisons are treated, which can be a
minefield of its own (e.g. #18435
<#18435>)
Unless I'm mistaken about 3) being tangential to the OP, I think it
confuses more than it clarifies.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#21152 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIpERLkthqo4LdxBZbH-8pngoj11bks5t1ajngaJpZM4UHM-o>
.
|
Indeed, I don't agree with how you state 2, but I don't necessarily disagree with the change itself. Coercing a I am not fully sure this was the best choice (we could also have opted for changing the DatetimeIndex behaviour), but I don't really object the choice (since there is not really a good reason people would want to do this, if they can also use datetime).
Yep, I agree we should ideally have done this with a deprecation cycle. And I personally think it is still worth to do it for 0.23.1 |
I evidently communicated poorly. The bit before the "--" was intended as the issue description and the bit after the "--" my opinion on it.
There was some discussion about this. The general thrust of it is that |
OK, but we don't really follow that if it comes to strings anyway ... (as that does not work on Timestamp, but does work for DatetimeIndex or Series). |
This came up in #18435 and the conclusion IIRC is that the string special-casing was very specifically a convenience feature for indexing purposes. |
I think we'll keep the new behavior and issue a warning that users probably don't want to compare to a datetime.date, since it's always false. |
It's not just about equality - arguably, But even if consistency is ultimately decided to be the higher good, at least the error warning should be better in one of the following cases -- i.e. not
|
I think we agreed this was a blocker for 0.23.1. Will try to get to it today. |
Code Sample, a copy-pastable example if possible
Problem description
In pandas 0.22 the code above printed "1", in 0.23 it prints "0".
Expected Output
1
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 45 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.0
pytest: 2.9.2
pip: 8.1.2
setuptools: 39.0.1
Cython: 0.24.1
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 5.1.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.3.2
xlrd: 1.1.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: 0.9999999
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: 0.1.2
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: