Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Median returns odd value when applied to timedeltas #10040

Closed
AndreaBravi opened this issue May 1, 2015 · 2 comments · Fixed by #10072
Closed

Median returns odd value when applied to timedeltas #10040

AndreaBravi opened this issue May 1, 2015 · 2 comments · Fixed by #10072
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations Timedelta Timedelta data type
Milestone

Comments

@AndreaBravi
Copy link

from pandas import DataFrame
from numpy import datetime64
data = ['2015-02-03', '2015-02-07']
data = DataFrame(data, dtype=datetime64)
data.diff().median()

This code returns

0   -53374 days +00:06:21.572612
dtype: timedelta64[ns]

while I would expect it to return the same as data.diff().mean()

0   4 days
dtype: timedelta64[ns]

Here is my setup

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Darwin
OS-release: 14.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8

pandas: 0.16.0
nose: 1.3.4
Cython: None
numpy: 1.9.2
scipy: 0.14.0
statsmodels: None
IPython: 2.3.1
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.2
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.2
openpyxl: 2.1.3
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: None
lxml: None
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: 2.5.4 (dt dec pq3 ext)
@jreback
Copy link
Contributor

jreback commented May 1, 2015

xref #9442. The algo that is doing this is converting to float64 so its losing precision. (as I don't think the underling ops is defined for int64).

this needs special handling for int64 in core/nanops/nanmedian. PR's are welcome

@jreback jreback added Bug Numeric Operations Arithmetic, Comparison, and Logical operations Timedelta Timedelta data type labels May 1, 2015
@jreback jreback added this to the Next Major Release milestone May 1, 2015
@mortada
Copy link
Contributor

mortada commented May 7, 2015

so I looked into this and I think it's actually not a precision issue. It's because the function is not handling pd.NaT correctly. Please see #10072

Basically values are being forced into float and therefore pd.NaT becomes a non-null floating point number. median() therefore computes the median with that invalid value (that value happens to be a large negative number)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants