Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Series.loc[list] with uint64 keys raises KeyError (converted to floats) #28023

Closed
jasonmahr opened this issue Aug 19, 2019 · 2 comments · Fixed by #29420
Closed

BUG: Series.loc[list] with uint64 keys raises KeyError (converted to floats) #28023

jasonmahr opened this issue Aug 19, 2019 · 2 comments · Fixed by #29420
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@jasonmahr
Copy link

bug = pd.Series([0, 1, 2, 3, 4], index=[7606741985629028552, 17876870360202815256, 13106359306506049338, 8991270399732411471, 8991270399732411471])
assert bug.loc[7606741985629028552]==0
assert bug.loc[17876870360202815256]==1
# throws KeyError: 'None of [[7606741985629028552, 17876870360202815256]] are in the [index]'
bug.loc[[7606741985629028552, 17876870360202815256]]

Problem description

This code throws KeyError even though clearly both keys are present in the index, as the asserts demonstrate. It is essential for the bug that there is a repeated element in the index (not the one being extracted). The bug disappears if the values are smaller; perhaps it's some kind of a type cast thing that makes the values not found?

Expected Output

7.606742e+18 0
1.787687e+19 1

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.14.72-68.55.amzn1.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.4
pytest: 3.7.3
pip: 18.0
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.15.0
scipy: 1.1.0
pyarrow: None
xarray: 0.10.8
IPython: 6.5.0
sphinx: 1.7.5
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.6
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.5
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.7
lxml: 4.2.4
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jorisvandenbossche
Copy link
Member

Thanks for the report!

As a workaround, you can pass the list as an index:

In [36]: bug.loc[pd.Index([7606741985629028552, 17876870360202815256])]                                                                                                                                            
Out[36]: 
7606741985629028552     0
17876870360202815256    1
dtype: int64

The reason that it is failing is because the list with values is converted to a float array of labels (this is the default behaviour of np.array(..), but in the pd.Index(..) constructor we have a special path to check for large ints and to automatically infer a uint64 dtype).
So to fix this bug, we need to use this correct inference in the indexing code as well. Closer investigation / PR always welcome!

@jorisvandenbossche jorisvandenbossche added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Aug 20, 2019
@jorisvandenbossche jorisvandenbossche added this to the Contributions Welcome milestone Aug 20, 2019
@jorisvandenbossche jorisvandenbossche changed the title Series.loc[list] incorrectly throws KeyError BUG: Series.loc[list] with uint64 keys raises KeyError (converted to floats) Aug 20, 2019
@oguzhanogreden
Copy link
Contributor

oguzhanogreden commented Aug 31, 2019

I've looked into this a bit, Thanks to the useful pointer of @jorisvandenbossche, I got the solution ready.

In addition, the fact that this "works" when there's no repeated index actually is pointing to another bug. In that case, the returned data frame has a Float64Index.

bug = pd.Series([0, 1, 2, 3, 4], index=[7606741985629028552, 17876870360202815256, 13106359306506049338, 8991270399732411471, 8991270399732411472])
assert bug.loc[7606741985629028552]==0
assert bug.loc[17876870360202815256]==1

bug.loc[[7606741985629028552, 17876870360202815256]].index

I have an idea for a solution but not sure of its possible implications. I'll look into it a bit and make a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants