Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type inferencing makes assignment of datetime to an object Series non-idempotent #13910

Closed
craigcitro opened this issue Aug 4, 2016 · 7 comments
Labels
Bug Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@craigcitro
Copy link

craigcitro commented Aug 4, 2016

Code Sample, a copy-pastable example if possible

>>> import pandas as pd
>>> s = pd.Series()
>>> ts = pd.Timestamp('2016-01-01')
>>> s['a'] = None
>>> s['b'] = ts
>>> s
a                   None
b    1451606400000000000
dtype: object

OK, no worries, we got coerced to integer. Now let's just redo the same assignment:

>>> s['b'] = ts
>>> s
a                   None
b    2016-01-01 00:00:00
dtype: object

That's ... suprising. This is probably just an unfortunate feature of a type inference algorithm, but it's awfully shocking.

Related examples for testing

#18410
#21143

Expected Output

The two outputs above would be identical; I'd prefer that they were both the second form (with timestamp information preserved), but anything consistent would be better than the current state.

output of pd.show_versions()

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 25.1.4
Cython: None
numpy: 1.11.1
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None
@jreback
Copy link
Contributor

jreback commented Aug 4, 2016

I recall an issue exactly like this, but can't seem to find it ( @sinhrks do you remember)?
will mark it. These should be the same (and not coerce to int). It should remain object dtype.

@jreback jreback added Bug Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves Dtype Conversions Unexpected or buggy dtype conversions Difficulty Intermediate labels Aug 4, 2016
@jreback jreback added this to the Next Major Release milestone Aug 4, 2016
@jreback
Copy link
Contributor

jreback commented Aug 4, 2016

pull-requests of course are welcome!

@sinhrks
Copy link
Member

sinhrks commented Aug 5, 2016

I suppose so, but what i could find is related to replace (#12747).

@shoyer
Copy link
Member

shoyer commented Aug 5, 2016

I wouldn't be surprised if this is related to numpy/numpy#7619

@jreback jreback modified the milestones: Interesting Issues, Next Major Release May 25, 2017
@jreback jreback modified the milestones: Interesting Issues, Next Major Release Nov 26, 2017
@mroeschke mroeschke changed the title Type inferencing makes assignment non-idempotent Type inferencing makes assignment of datetime to an object Series non-idempotent May 20, 2018
@samuelsinayoko
Copy link
Contributor

I'm not able to reproduce this.

>>> ser = pd.Series()                                                                                                                         
>>> ser['a'] = None                                                                                                                           
>>> ser['b'] = pd.Timestamp('2016-01-01')                                                                                                     
>>> ser                                                                                                                                       
a          NaT
b   2016-01-01
dtype: datetime64[ns]

Is this still an issue?

>>> pd.show_versions()                                                                                                                        
Duplicate key in file '/Users/sinayoks/.matplotlib/matplotlibrc' line #381.

INSTALLED VERSIONS
------------------
commit           : 9d56cfc7a164afec77b2701b20101c600f6982b6
python           : 3.7.5.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 17.7.0
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : en_GB.UTF-8
LANG             : en_GB.UTF-8
LOCALE           : en_GB.UTF-8

pandas           : 0.25.0.dev0+1732.g9d56cfc
numpy            : 1.17.2
pytz             : 2019.3
dateutil         : 2.8.0
pip              : 19.3.1
setuptools       : 41.6.0.post20191030
Cython           : 0.29.13
pytest           : 5.2.2
hypothesis       : 4.36.2
sphinx           : 2.2.1
blosc            : None
feather          : None
xlsxwriter       : 1.2.2
lxml.etree       : 4.4.1
html5lib         : 1.0.1
pymysql          : None
psycopg2         : None
jinja2           : 2.10.3
IPython          : 7.9.0
pandas_datareader: None
bs4              : 4.8.1
bottleneck       : 1.2.1
fastparquet      : 0.3.2
gcsfs            : None
lxml.etree       : 4.4.1
matplotlib       : 3.1.1
numexpr          : 2.7.0
odfpy            : None
openpyxl         : 3.0.0
pandas_gbq       : None
pyarrow          : 0.15.0
pytables         : None
s3fs             : 0.3.4
scipy            : 1.3.1
sqlalchemy       : 1.3.10
tables           : 3.5.1
xarray           : 0.13.0
xlrd             : 1.2.0
xlwt             : 1.3.0
xlsxwriter       : 1.2.2

@craigcitro
Copy link
Author

Yep, this seems to be fixed.

It looks like it happened somewhere between 0.25.0rc0 and 0.25.0, but I didn't try to bisect any further:
image

@jreback
Copy link
Contributor

jreback commented Nov 2, 2019

would take a test PR for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

6 participants