Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series with dtype=object does unexpected type conversion #21881

Closed
AllenDowney opened this issue Jul 12, 2018 · 5 comments · Fixed by #39285
Closed

Series with dtype=object does unexpected type conversion #21881

AllenDowney opened this issue Jul 12, 2018 · 5 comments · Fixed by #39285
Assignees
Labels
Dtype Conversions Unexpected or buggy dtype conversions good first issue Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@AllenDowney
Copy link
Contributor

Code Sample, a copy-pastable example if possible

# Example 1

timestamp = pd.Timestamp(1412526600000000000)
series = pd.Series([], dtype=object)
series['timestamp'] = timestamp
type(series.timestamp)

# Example 2

series = pd.Series([], dtype=object)
series['anything'] = 300.0
series['timestamp'] = timestamp
type(series.timestamp)

Problem description

In the first example, the timestamp is still a Timestamp.

In the second example, the timestamp gets converted to int.

Expected Output

I expected the timestamp to continue to be a Timestamp, especially because the dtype of the Series is object. Why are the types of the values getting converted?

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-128-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.utf8
LOCALE: en_US.UTF-8

pandas: 0.23.0
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@WillAyd
Copy link
Member

WillAyd commented Jul 12, 2018

This is probably the same issue as #13910

@WillAyd WillAyd added Dtype Conversions Unexpected or buggy dtype conversions Bug Datetime Datetime data dtype labels Jul 12, 2018
@AllenDowney
Copy link
Contributor Author

AllenDowney commented Jul 12, 2018 via email

@makbigc
Copy link
Contributor

makbigc commented Jul 22, 2018

When adding a new index, np.concatenate joins the new array of value to the old one in the function _setitem_with_indexer.

np.concatenate turns the array with dtype='datetime64[ns] into int. While it doesn't turn the array with dtype=object.

Explicitly,


In [132]: array([pd.Timestamp('2016-01-01')])
Out[132]: array([Timestamp('2016-01-01 00:00:00')], dtype=object)

In [133]: np.concatenate([array([None]), array([pd.Timestamp('2016-01-01')])])
Out[133]: array([None, Timestamp('2016-01-01 00:00:00')], dtype=object)

In [134]: a = Series(pd.Timestamp('2016-01-01'))._values

In [135]: a
Out[135]: array(['2016-01-01T00:00:00.000000000'], dtype='datetime64[ns]')

In [136]: np.concatenate([array([None]), a])
Out[136]: array([None, 1451606400000000000], dtype=object)

Should a type check be added before np.concatenate? But it makes the code redundant.

@mroeschke
Copy link
Member

This looks to work on master now. Could use a test

In [81]: timestamp = pd.Timestamp(1412526600000000000)
    ...: series = pd.Series([], dtype=object)
    ...: series['timestamp'] = timestamp
    ...: type(series.timestamp)
Out[81]: pandas._libs.tslibs.timestamps.Timestamp

In [82]: series = pd.Series([], dtype=object)
    ...: series['anything'] = 300.0
    ...: series['timestamp'] = timestamp
    ...: type(series.timestamp)
Out[82]: pandas._libs.tslibs.timestamps.Timestamp

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Dtype Conversions Unexpected or buggy dtype conversions Datetime Datetime data dtype labels Apr 1, 2020
@arongergely
Copy link

take

@jreback jreback added this to the 1.3 milestone Jan 20, 2021
@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves labels Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions good first issue Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants