Series with dtype=object does unexpected type conversion #21881

AllenDowney · 2018-07-12T21:09:09Z

Code Sample, a copy-pastable example if possible

# Example 1

timestamp = pd.Timestamp(1412526600000000000)
series = pd.Series([], dtype=object)
series['timestamp'] = timestamp
type(series.timestamp)

# Example 2

series = pd.Series([], dtype=object)
series['anything'] = 300.0
series['timestamp'] = timestamp
type(series.timestamp)

Problem description

In the first example, the timestamp is still a Timestamp.

In the second example, the timestamp gets converted to int.

Expected Output

I expected the timestamp to continue to be a Timestamp, especially because the dtype of the Series is object. Why are the types of the values getting converted?

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-128-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.utf8
LOCALE: en_US.UTF-8

pandas: 0.23.0
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2018-07-12T21:16:12Z

This is probably the same issue as #13910

AllenDowney · 2018-07-12T22:08:47Z

Thanks for the quick reply! Yes, looks like it is the same issue. Any suggestions on a workaround? I thought with `dtype=object`, Series might stop trying to infer types. Is there any way to just turn off type inference?

…

On Thu, Jul 12, 2018 at 5:16 PM, William Ayd ***@***.***> wrote: This is probably the same issue as #13910 <#13910> — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#21881 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABy37bcl-Bm3NSA-rhUPCfMDlneKNNgUks5uF7y7gaJpZM4VNvfU> .

makbigc · 2018-07-22T15:35:07Z

When adding a new index, np.concatenate joins the new array of value to the old one in the function _setitem_with_indexer.

np.concatenate turns the array with dtype='datetime64[ns] into int. While it doesn't turn the array with dtype=object.

Explicitly,


In [132]: array([pd.Timestamp('2016-01-01')])
Out[132]: array([Timestamp('2016-01-01 00:00:00')], dtype=object)

In [133]: np.concatenate([array([None]), array([pd.Timestamp('2016-01-01')])])
Out[133]: array([None, Timestamp('2016-01-01 00:00:00')], dtype=object)

In [134]: a = Series(pd.Timestamp('2016-01-01'))._values

In [135]: a
Out[135]: array(['2016-01-01T00:00:00.000000000'], dtype='datetime64[ns]')

In [136]: np.concatenate([array([None]), a])
Out[136]: array([None, 1451606400000000000], dtype=object)

Should a type check be added before np.concatenate? But it makes the code redundant.

mroeschke · 2020-04-01T03:27:06Z

This looks to work on master now. Could use a test

In [81]: timestamp = pd.Timestamp(1412526600000000000)
    ...: series = pd.Series([], dtype=object)
    ...: series['timestamp'] = timestamp
    ...: type(series.timestamp)
Out[81]: pandas._libs.tslibs.timestamps.Timestamp

In [82]: series = pd.Series([], dtype=object)
    ...: series['anything'] = 300.0
    ...: series['timestamp'] = timestamp
    ...: type(series.timestamp)
Out[82]: pandas._libs.tslibs.timestamps.Timestamp

arongergely · 2020-11-14T17:17:25Z

take

WillAyd added Dtype Conversions Unexpected or buggy dtype conversions Bug Datetime Datetime data dtype labels Jul 12, 2018

KalyanGokhale mentioned this issue Jul 31, 2018

BUG: Fixes unwanted casting in .isin (GH21804) #21893

Closed

4 tasks

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Dtype Conversions Unexpected or buggy dtype conversions Datetime Datetime data dtype labels Apr 1, 2020

github-actions bot assigned arongergely Nov 14, 2020

jreback mentioned this issue Jan 20, 2021

Series with dtype=object does unexpected type conversion #39285

Merged

3 tasks

jreback added this to the 1.3 milestone Jan 20, 2021

jreback added Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves labels Jan 20, 2021

jreback closed this as completed in #39285 Jan 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Series with dtype=object does unexpected type conversion #21881

Series with dtype=object does unexpected type conversion #21881

AllenDowney commented Jul 12, 2018

INSTALLED VERSIONS

WillAyd commented Jul 12, 2018

AllenDowney commented Jul 12, 2018 via email

makbigc commented Jul 22, 2018

mroeschke commented Apr 1, 2020

arongergely commented Nov 14, 2020

Series with dtype=object does unexpected type conversion #21881

Series with dtype=object does unexpected type conversion #21881

Comments

AllenDowney commented Jul 12, 2018

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

WillAyd commented Jul 12, 2018

AllenDowney commented Jul 12, 2018 via email

makbigc commented Jul 22, 2018

mroeschke commented Apr 1, 2020

arongergely commented Nov 14, 2020

Output of `pd.show_versions()`