-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected results in Numpy datetime64 comparison within DataFrame #16831
Comments
Same issue here. |
Can you make a complete example? Your |
I cannot give you the actual dataframe I'm working with since it's private data, but here's a very naive executable code to illustrate the issue. import pandas as pd
import numpy as np
from datetime import datetime,timedelta
today = datetime.today()
x = [today] * 10000
df = pd.DataFrame({'date':x})
threshold = np.datetime64(datetime.today()+timedelta(weeks=3))
#and then the comparisons:
df[threshold < df['date']]
df[df['date'] < threshold] Thanks. |
Seems to be related to the numpy timestamp being microsecond precision: In [111]: np.datetime64(today + timedelta(weeks=3)) < pd.Series([today])
Out[111]:
0 True
dtype: bool
In [112]: np.datetime64(today + timedelta(weeks=3)).astype("<M8[ns]") < pd.Series([today])
Out[112]:
0 False
dtype: bool I'm not sure what the desired outcome is here. pandas only deals with nanosecond precision timestamps, so do we silently change the precision of the input, or raise an error? Either way, we need to fix things to be consistent between Series and DataFrame here. |
I'm sorry, I don't get it. Even after adding three weeks of delta, why do I still get True in Thank you anyway for addressing the issue so fast! |
Sorry if I wasn't clear, it's definitely a bug. It should be False (or maybe an exception). Numpy stores datetimes as int64s, where the exact datetime of an integer depends on the resolution. In [119]: np.datetime64(today).view('i8')
Out[119]: 1499262896667864
In [120]: np.datetime64(today).astype('<M8[ns]').view('i8')
Out[120]: 1499262896667864000 It's possible (haven't confirmed yet) that when you do |
Ah, indeed this seems to be a duplicate of #7996. For now, you can workaround by converting |
Oh now I get it, thanks again!! |
Code Sample, a copy-pastable example if possible
Problem description
As the two comparisons above show, they should present opposite results. Instead, both of them return the same result, as if df['date'] was always the first comparison operand.
Expected Output
The picture below illustrates the issue. It was expected that the line
df[threshold < df['date']]
would result in an empty DataFrame.Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-83-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 34.4.1
Cython: None
numpy: 1.12.1
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 2.0.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.3
html5lib: None
httplib2: 0.10.3
apiclient: 1.6.2
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: 2.38.0
pandas_datareader: None
The text was updated successfully, but these errors were encountered: