Stripped timedelta[s] series represented as timedelta[ns] #12425

toobaz · 2016-02-23T12:10:46Z

In [2]: s = pd.Series(range(61)).astype('timedelta64[s]')

In [3]: s.tail(1)
Out[3]: 
60   00:01:00
dtype: timedelta64[s]

In [4]: str(s).splitlines()[-2]
Out[4]: '60   00:00:00.000000'

... because the snipped representation interprets the content of the series as nanoseconds rather than milliseconds. The same happens with timedelta[ms] and probably any other resolution (if there are).

In [5]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: 404819358e90da57c8025a259ab58cd75426069f
python: 3.4.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.3.0-1-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: it_IT.utf8

pandas: 0.18.0rc1+35.g4048193
nose: 1.3.6
pip: 1.5.6
setuptools: 20.1.1
Cython: 0.23.2
numpy: 1.10.0.post2
scipy: 0.16.0
statsmodels: 0.8.0.dev0+755fa81
xarray: None
IPython: 4.1.1
sphinx: 1.3.1
patsy: 0.3.0-dev
dateutil: 2.4.2
pytz: 2015.6
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.3
matplotlib: 1.5.dev1
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: 0.7.3
lxml: None
bs4: 4.4.0
html5lib: 0.999
httplib2: 0.9.1
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
jinja2: 2.8

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2016-02-23T13:56:27Z

The same example, but shown in another way:

In [27]: s = pd.Series(range(5)).astype('timedelta64[s]')

In [28]: s
Out[28]:
0   00:00:00
1   00:00:01
2   00:00:02
3   00:00:03
4   00:00:04
dtype: timedelta64[s]

In [29]: pd.options.display.max_rows = 4

In [30]: s
Out[30]:
0          00:00:00
1   00:00:00.000000
          ...
3   00:00:00.000000
4   00:00:00.000000
dtype: timedelta64[ns]

jorisvandenbossche · 2016-02-23T13:58:26Z

Possibly related with #11594

jorisvandenbossche · 2016-02-23T14:05:29Z

The problem actually lies in (the usage of) concat:

In [38]: pd.concat([s[0:2], s[-2:]])
Out[38]:
0          00:00:00
1   00:00:00.000000
3   00:00:00.000000
4   00:00:00.000000
dtype: timedelta64[ns]

jreback · 2016-02-23T14:19:58Z

well this is a different issue.

.astype('timedelta64[s]') is intepreting the timedelta64[s] units as ns so there needs to be a conversion done in ``core.common._possibly_convert_to_datetimelike` or maybe in the astype itself. So i'll mark this bug as one like that. Note this feature is not technically supported.

e.g. we reject things like this (so we can either reject, or better yet accept both). We do this for datetime64, IOW, we accept multiple dtypes (and then convert).

In [1]: Series(np.arange(61),dtype='m8[s]')
TypeError: cannot convert timedeltalike to dtype [timedelta64[s]]

Separately is the bug @jorisvandenbossche notes, #11594 which is different (and not a problem with .concat), rather with preserving the dtype on slicing. I'll put an example there

toobaz · 2016-02-23T14:32:17Z

@jreback : ... but there is no problem in the .tail() (or .iloc[-1], actually) version of the same Series. How can it be the fault of .astype('timedelta64[s]')?!

jreback · 2016-02-23T14:48:32Z

These are not stored correctly. We only store as timedelta64[ns]

In [1]: s = pd.Series(range(61)).astype('timedelta64[s]')

In [2]: s.values
Out[2]: 
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60], dtype='timedelta64[s]')

In [3]: Series(pd.to_timedelta(range(61),unit='s')).values
Out[3]: 
array([          0,  1000000000,  2000000000,  3000000000,  4000000000,
        5000000000,  6000000000,  7000000000,  8000000000,  9000000000,
       10000000000, 11000000000, 12000000000, 13000000000, 14000000000,
       15000000000, 16000000000, 17000000000, 18000000000, 19000000000,
       20000000000, 21000000000, 22000000000, 23000000000, 24000000000,
       25000000000, 26000000000, 27000000000, 28000000000, 29000000000,
       30000000000, 31000000000, 32000000000, 33000000000, 34000000000,
       35000000000, 36000000000, 37000000000, 38000000000, 39000000000,
       40000000000, 41000000000, 42000000000, 43000000000, 44000000000,
       45000000000, 46000000000, 47000000000, 48000000000, 49000000000,
       50000000000, 51000000000, 52000000000, 53000000000, 54000000000,
       55000000000, 56000000000, 57000000000, 58000000000, 59000000000,
       60000000000], dtype='timedelta64[ns]')

toobaz · 2016-02-23T14:56:36Z

OK... but the ordinary __repr__ is aware of this, so it behaves fine. If the snipped version also behaved well, and we just stated "all timedelta64[*] are stored as nanoseconds", wouldn't everybody be happy?

(that is: am I missing an official definition of timedelta64[s] which we should comply with?)

jreback · 2016-02-23T14:59:13Z

no the repr is wrong which is indicative of the internal represenation is wrong. This needs to be fixed in the astyping. Putting something in the docs is the very last thing to do. It is a bug and should be fixed (though technicaly this is not supported, but prob should be)

toobaz · 2016-02-24T10:46:09Z

Does the following (on datetime rather than timedelta) also reflect a bug?

In [2]: str(pd.Series(range(100)).astype('datetime64[ms]')).splitlines()[-1]
Out[2]: 'dtype: datetime64[ns]'

jreback · 2016-02-24T13:29:25Z

why are you doing this stringifying thing? that doesn't make any sense.

yes, this is properly converted to datetime64[ns]

In [10]: pd.Series(range(100)).astype('datetime64[ms]').dt.microsecond.head()
Out[10]: 
0       0
1    1000
2    2000
3    3000
4    4000
dtype: int64

In [13]: pd.to_datetime(pd.Series(range(100)), unit='ms').dt.microsecond.head()
Out[13]: 
0       0
1    1000
2    2000
3    3000
4    4000
dtype: int64

note, no millisecond attribute because its not compat with datetime.

jorisvandenbossche · 2016-02-24T13:35:39Z

@jreback The stringifying is another way to show what the repr looks like. And this does makes sense in eg the case of timedelta, as the dtype in the repr did not correspond with the actual dtype of the series

But in case of datetime64, these values are stored correctly (so @toobaz what you showed in your last comment is not a bug, but the correct behaviour). When doing astype('datetime64[s]'), the values are correctly interpreted as seconds, and subsequently converted for datetime64[ns] to store it in the series:


In [22]: pd.Series(range(5)).astype('datetime64[s]')
Out[22]:
0   1970-01-01 00:00:00
1   1970-01-01 00:00:01
2   1970-01-01 00:00:02
3   1970-01-01 00:00:03
4   1970-01-01 00:00:04
dtype: datetime64[ns]

In [23]: pd.Series(range(5)).astype('datetime64[s]').values
Out[23]:
array(['1970-01-01T01:00:00.000000000+0100',
       '1970-01-01T01:00:01.000000000+0100',
       '1970-01-01T01:00:02.000000000+0100',
       '1970-01-01T01:00:03.000000000+0100',
       '1970-01-01T01:00:04.000000000+0100'], dtype='datetime64[ns]')

Probably the same should happen for astype('timedelta[s]')?

In [29]: pd.Series(range(5)).astype('timedelta64[s]')
Out[29]:
0   00:00:00
1   00:00:01
2   00:00:02
3   00:00:03
4   00:00:04
dtype: timedelta64[s]

In [30]: pd.Series(range(5)).astype('timedelta64[s]').values
Out[30]: array([0, 1, 2, 3, 4], dtype='timedelta64[s]')

jreback · 2016-02-24T13:45:44Z

@jorisvandenbossche exactly, that's what I noted above. datetime64 are all coerced pretty well. it can be used as a model on how/where to do a similar coercion (which is done already for the constructor and most places, obviously not in astyping).

closes pandas-dev#19223 closes pandas-dev#12425

…es (#19224) closes #19223 closes #12425

jorisvandenbossche added Bug Output-Formatting __repr__ of pandas objects, to_string labels Feb 23, 2016

jreback added Dtype Conversions Unexpected or buggy dtype conversions Timedelta Timedelta data type Difficulty Intermediate and removed Output-Formatting __repr__ of pandas objects, to_string labels Feb 23, 2016

jreback added this to the Next Major Release milestone Feb 23, 2016

jorisvandenbossche mentioned this issue Jan 13, 2018

API/DEPR: re-instate timedelta[non-unit] conversions as a no-op #19225

Closed

jreback modified the milestones: Next Major Release, 0.23.0 Jan 13, 2018

jreback mentioned this issue Jan 13, 2018

BUG/TST: assure conversions of datetimelikes for object, numeric dtypes #19224

Merged

jreback added a commit to jreback/pandas that referenced this issue Jan 13, 2018

BUG/TST: assure conversions of datetimelikes for object, numeric dtypes

05859f6

closes pandas-dev#19223 closes pandas-dev#12425

jreback added a commit to jreback/pandas that referenced this issue Jan 13, 2018

BUG/TST: assure conversions of datetimelikes for object, numeric dtypes

b953695

closes pandas-dev#19223 closes pandas-dev#12425

jreback closed this as completed in #19224 Jan 13, 2018

jreback added a commit that referenced this issue Jan 13, 2018

BUG/TST: assure conversions of datetimelikes for object, numeric dtyp…

0477880

…es (#19224) closes #19223 closes #12425

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stripped timedelta[s] series represented as timedelta[ns] #12425

Stripped timedelta[s] series represented as timedelta[ns] #12425

toobaz commented Feb 23, 2016

jorisvandenbossche commented Feb 23, 2016

jorisvandenbossche commented Feb 23, 2016

jorisvandenbossche commented Feb 23, 2016

jreback commented Feb 23, 2016

toobaz commented Feb 23, 2016

jreback commented Feb 23, 2016

toobaz commented Feb 23, 2016

jreback commented Feb 23, 2016

toobaz commented Feb 24, 2016

jreback commented Feb 24, 2016

jorisvandenbossche commented Feb 24, 2016

jreback commented Feb 24, 2016

Stripped timedelta[s] series represented as timedelta[ns] #12425

Stripped timedelta[s] series represented as timedelta[ns] #12425

Comments

toobaz commented Feb 23, 2016

jorisvandenbossche commented Feb 23, 2016

jorisvandenbossche commented Feb 23, 2016

jorisvandenbossche commented Feb 23, 2016

jreback commented Feb 23, 2016

toobaz commented Feb 23, 2016

jreback commented Feb 23, 2016

toobaz commented Feb 23, 2016

jreback commented Feb 23, 2016

toobaz commented Feb 24, 2016

jreback commented Feb 24, 2016

jorisvandenbossche commented Feb 24, 2016

jreback commented Feb 24, 2016