Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas replace with string and integers - incorrect behavior? #14550

Closed
ozhogin opened this issue Nov 1, 2016 · 2 comments
Closed

Pandas replace with string and integers - incorrect behavior? #14550

ozhogin opened this issue Nov 1, 2016 · 2 comments
Labels
Bug Duplicate Report Duplicate issue or pull request

Comments

@ozhogin
Copy link

ozhogin commented Nov 1, 2016

I encountered a potentially incorrect behavior of pandas replace with strings and integers. If the dataframe has both 0 (integer) and '0' (strings) then replace '0' affects both strings and integers. Here's how it goes:

In [1]: df = pd.DataFrame({'numbers' : [0, 1, 2, 0], 'strings' : ['0', 1, 2, '0']})

To check that it's indeed the correct setup:

In [2]: df.dtypes
Out [2]:
numbers     int64
strings    object
dtype: object

And check individual values:

In [3]: type(df['numbers'][0])
Out[3]: numpy.int64
In [4]: type(df['strings'][0])
Out[4]: str

Now, do replace:

In [5]: df.replace(to_replace='0', value=np.NaN, inplace=True)
In [6]: df.head()
Out[6]: 
   numbers  strings
0      NaN      NaN
1        1        1
2        2        2
3      NaN      NaN

As you can see, it replaced both strings and integers, however should have worked only on the strings. If we try same on integers, it works correctly:

In [7]: df = pd.DataFrame({'numbers' : [0, 1, 2, 0], 'strings' : ['0', 1, 2, '0']})
...: df.replace(to_replace=0, value=np.NaN, inplace=True)
...: print df.head()
Out [7]:   
numbers strings
0      NaN       0
1        1       1
2        2       2
3      NaN       0

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.8.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.19.0
nose: None
pip: 8.1.2
setuptools: 3.6
Cython: None
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 4.0.3
sphinx: None
patsy: 0.4.1
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: 1.0.0
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: 1.0b8
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

@jorisvandenbossche
Copy link
Member

@ozhogin Thanks for the report. That indeed looks like a bug (I think replace is not that much tested for non-string values, the docs also mainly speak about strings)

Always welcome to look into it!

@jorisvandenbossche
Copy link
Member

Related issue: #12747. I am going to close this issue, and add it as an additional case in #12747

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

2 participants