Potential regression in str dtype handling in 0.23? #21270

clembou · 2018-05-31T08:06:03Z

Code Sample, a copy-pastable example if possible

using pandas 0.22:

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: object_data = ['silly walks', np.nan, 'dead parrot']

In [4]: pd.Series(object_data, dtype=object)
Out[4]:
0    silly walks
1            NaN
2    dead parrot
dtype: object

In [5]: pd.Series(object_data, dtype=str)
Out[5]:
0    silly walks
1            NaN
2    dead parrot
dtype: object

using pandas 0.23:

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: object_data = ['silly walks', np.nan, 'dead parrot']

In [4]: pd.Series(object_data, dtype=object)
Out[4]:
0    silly walks
1            NaN
2    dead parrot
dtype: object

In [5]: pd.Series(object_data, dtype=str)
Out[5]:
0    silly walks
1            nan
2    dead parrot
dtype: object

Problem description

as of v0.23, when provided with the str dtype, a np.nan in a list of values gets converter to the string nan.

the docs don't specify str as a valid dtype in the first place , so perhaps the problem here is that pandas should be stricter and check that the provided dtype is supported? Or the docs could mention that a custom conversion function can be passed? either way this caused test failures on clembou/behave-pandas#7, so I thought I'd raise an issue as I might not be the only one affected.

Feel free to close if you think this is simply an unsupported use of the dtype parameter.

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Darwin OS-release: 17.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8

pandas: 0.23.0
pytest: None
pip: 10.0.1
setuptools: 39.2.0
Cython: None
numpy: 1.14.3
scipy: None
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2018-05-31T08:41:39Z

Hmm, I seem to recall other issue(s) recently related to this, but can't find it directly. cc @toobaz

jreback · 2018-05-31T09:15:15Z

#20401

TomAugspurger · 2018-05-31T15:49:02Z

@jorisvandenbossche thinking of #21083?

jorisvandenbossche · 2018-05-31T16:35:15Z

@jorisvandenbossche thinking of #21083?

Ah, yes! I was searching for "dtype str nan", and not "none" .. :-)

clembou · 2018-05-31T16:38:16Z

yes this is pretty much a duplicate - I did not see this this morning! feel free to close in favour of #21083!

jorisvandenbossche added this to the 0.23.1 milestone May 31, 2018

TomAugspurger closed this as completed Jun 1, 2018

jorisvandenbossche mentioned this issue Jun 7, 2018

REGR: NA-values in ctors with string dtype #21366

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential regression in str dtype handling in 0.23? #21270

Potential regression in str dtype handling in 0.23? #21270

clembou commented May 31, 2018

jorisvandenbossche commented May 31, 2018

jreback commented May 31, 2018

TomAugspurger commented May 31, 2018

jorisvandenbossche commented May 31, 2018

clembou commented May 31, 2018

Potential regression in str dtype handling in 0.23? #21270

Potential regression in str dtype handling in 0.23? #21270

Comments

clembou commented May 31, 2018

Code Sample, a copy-pastable example if possible

Problem description

Output of pd.show_versions()

jorisvandenbossche commented May 31, 2018

jreback commented May 31, 2018

TomAugspurger commented May 31, 2018

jorisvandenbossche commented May 31, 2018

clembou commented May 31, 2018

Output of `pd.show_versions()`