You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
bug=pd.Series([0, 1, 2, 3, 4], index=[7606741985629028552, 17876870360202815256, 13106359306506049338, 8991270399732411471, 8991270399732411471])
assertbug.loc[7606741985629028552]==0assertbug.loc[17876870360202815256]==1# throws KeyError: 'None of [[7606741985629028552, 17876870360202815256]] are in the [index]'bug.loc[[7606741985629028552, 17876870360202815256]]
Problem description
This code throws KeyError even though clearly both keys are present in the index, as the asserts demonstrate. It is essential for the bug that there is a repeated element in the index (not the one being extracted). The bug disappears if the values are smaller; perhaps it's some kind of a type cast thing that makes the values not found?
Expected Output
7.606742e+18 0
1.787687e+19 1
Output of pd.show_versions()
[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS
The reason that it is failing is because the list with values is converted to a float array of labels (this is the default behaviour of np.array(..), but in the pd.Index(..) constructor we have a special path to check for large ints and to automatically infer a uint64 dtype).
So to fix this bug, we need to use this correct inference in the indexing code as well. Closer investigation / PR always welcome!
jorisvandenbossche
changed the title
Series.loc[list] incorrectly throws KeyError
BUG: Series.loc[list] with uint64 keys raises KeyError (converted to floats)
Aug 20, 2019
I've looked into this a bit, Thanks to the useful pointer of @jorisvandenbossche, I got the solution ready.
In addition, the fact that this "works" when there's no repeated index actually is pointing to another bug. In that case, the returned data frame has a Float64Index.
Problem description
This code throws KeyError even though clearly both keys are present in the index, as the asserts demonstrate. It is essential for the bug that there is a repeated element in the index (not the one being extracted). The bug disappears if the values are smaller; perhaps it's some kind of a type cast thing that makes the values not found?
Expected Output
7.606742e+18 0
1.787687e+19 1
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.14.72-68.55.amzn1.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.4
pytest: 3.7.3
pip: 18.0
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.15.0
scipy: 1.1.0
pyarrow: None
xarray: 0.10.8
IPython: 6.5.0
sphinx: 1.7.5
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.6
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.5
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.7
lxml: 4.2.4
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: