Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.set_value() fails with numpy types #17256

Closed
dov opened this issue Aug 15, 2017 · 3 comments
Closed

DataFrame.set_value() fails with numpy types #17256

dov opened this issue Aug 15, 2017 · 3 comments
Labels
Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@dov
Copy link

dov commented Aug 15, 2017

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

df = pd.DataFrame([[0,0,None],
                   [0,0,None],
                   [1,1,None]],
                  columns = ['A','B','C']).set_index(['A','B'])

# This succeeds
idx = (0,0)
df.set_value(idx,'C',5)
print df

# This fails
idx = (np.int64(0),np.int64(0))
df.set_value(idx,'C',8)
print df

Problem description

The above does a multirow assignment through the DataFrame.set_value() function by two different indices. In the first case the index is built by plain python values. This succeeds. In the second case, which is superficially equal, the index is built with numpy.int64 integers (that I actually got back from a groupby call). This case fails.

Is this behavior expected or is this a bug?

Expected Output


        C
A B      
0 0     5
  0     5
1 1  None
        C
A B      
0 0     8
  0     8
1 1  None

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Linux
OS-release: 4.11.9-300.fc26.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: en_US.UTF-8

pandas: 0.20.1
pytest: 3.1.1
pip: 9.0.1
setuptools: 36.2.7
Cython: None
numpy: 1.12.0
scipy: 0.19.1
xarray: None
IPython: 5.4.1
sphinx: 1.6.3
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.1
feather: None
matplotlib: 2.0.0
openpyxl: None
xlrd: 0.9.4
xlwt: 1.1.2
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.11
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@chris-b1
Copy link
Contributor

This probably should work, but more idiomatic to use loc which handles this already

In [27]: df.loc[(np.int64(0),np.int64(0)), 'C'] = 5

In [28]: df
Out[28]: 
        C
A B      
0 0     5
  0     5
1 1  None

@chris-b1
Copy link
Contributor

xref #15269 - proposal to deprecate set_value

@gfyoung gfyoung added the Indexing Related to indexing on series/frames, not to indexes themselves label Aug 15, 2017
@jorisvandenbossche jorisvandenbossche added this to the No action milestone Oct 5, 2017
@jorisvandenbossche
Copy link
Member

Closing this, as we deprecated set_value/get_value

(although at has the same behaviour, but I would say that if you want to use numpy scalars, you need to use loc, to keep at as the more advanced but faster one)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

4 participants