Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential regression in str dtype handling in 0.23? #21270

Closed
clembou opened this issue May 31, 2018 · 5 comments
Closed

Potential regression in str dtype handling in 0.23? #21270

clembou opened this issue May 31, 2018 · 5 comments
Milestone

Comments

@clembou
Copy link

clembou commented May 31, 2018

Code Sample, a copy-pastable example if possible

using pandas 0.22:

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: object_data = ['silly walks', np.nan, 'dead parrot']

In [4]: pd.Series(object_data, dtype=object)
Out[4]:
0    silly walks
1            NaN
2    dead parrot
dtype: object

In [5]: pd.Series(object_data, dtype=str)
Out[5]:
0    silly walks
1            NaN
2    dead parrot
dtype: object

using pandas 0.23:

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: object_data = ['silly walks', np.nan, 'dead parrot']

In [4]: pd.Series(object_data, dtype=object)
Out[4]:
0    silly walks
1            NaN
2    dead parrot
dtype: object

In [5]: pd.Series(object_data, dtype=str)
Out[5]:
0    silly walks
1            nan
2    dead parrot
dtype: object

Problem description

as of v0.23, when provided with the str dtype, a np.nan in a list of values gets converter to the string nan.

the docs don't specify str as a valid dtype in the first place , so perhaps the problem here is that pandas should be stricter and check that the provided dtype is supported? Or the docs could mention that a custom conversion function can be passed? either way this caused test failures on clembou/behave-pandas#7, so I thought I'd raise an issue as I might not be the only one affected.

Feel free to close if you think this is simply an unsupported use of the dtype parameter.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Darwin OS-release: 17.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8

pandas: 0.23.0
pytest: None
pip: 10.0.1
setuptools: 39.2.0
Cython: None
numpy: 1.14.3
scipy: None
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jorisvandenbossche
Copy link
Member

Hmm, I seem to recall other issue(s) recently related to this, but can't find it directly. cc @toobaz

@jorisvandenbossche jorisvandenbossche added this to the 0.23.1 milestone May 31, 2018
@jreback
Copy link
Contributor

jreback commented May 31, 2018

#20401

@TomAugspurger
Copy link
Contributor

@jorisvandenbossche thinking of #21083?

@jorisvandenbossche
Copy link
Member

@jorisvandenbossche thinking of #21083?

Ah, yes! I was searching for "dtype str nan", and not "none" .. :-)

@clembou
Copy link
Author

clembou commented May 31, 2018

yes this is pretty much a duplicate - I did not see this this morning! feel free to close in favour of #21083!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants