Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct type inference for UInt64Index during access #29420

Merged
merged 13 commits into from
Nov 27, 2019
17 changes: 10 additions & 7 deletions pandas/core/indexes/numeric.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

import numpy as np

from pandas._libs import index as libindex
from pandas._libs import index as libindex, lib
from pandas.util._decorators import Appender, cache_readonly

from pandas.core.dtypes.cast import astype_nansafe
Expand Down Expand Up @@ -302,12 +302,15 @@ def _convert_scalar_indexer(self, key, kind=None):

@Appender(_index_shared_docs["_convert_arr_indexer"])
def _convert_arr_indexer(self, keyarr):
# Cast the indexer to uint64 if possible so
# that the values returned from indexing are
# also uint64.
keyarr = com.asarray_tuplesafe(keyarr)
if is_integer_dtype(keyarr):
return com.asarray_tuplesafe(keyarr, dtype=np.uint64)
# Cast the indexer to uint64 if possible so that the values returned
# from indexing are also uint64.
if is_integer_dtype(keyarr) or (
lib.infer_dtype(keyarr, skipna=False) == "integer"
):
keyarr = com.asarray_tuplesafe(keyarr, dtype=np.uint64)
else:
keyarr = com.asarray_tuplesafe(keyarr)

return keyarr

@Appender(_index_shared_docs["_convert_index_indexer"])
Expand Down
36 changes: 36 additions & 0 deletions pandas/tests/indexes/test_numeric.py
Original file line number Diff line number Diff line change
Expand Up @@ -1187,3 +1187,39 @@ def test_range_float_union_dtype():

result = other.union(index)
tm.assert_index_equal(result, expected)


def test_uint64_keys_in_list():
# https://github.com/pandas-dev/pandas/issues/28023
bug = pd.Series(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could both these test functions be combined into 1 func? Seem very similar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are indeed similar. I'll wait for a few more comments to decide on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea pls do we prefer parametized tests as much as possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revisiting the issues, I realized one of the tests would be redundant and removed that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revisiting the issues, I realized one of the tests would be redundant and removed that.

How come? - it seems that test case was exactly covering issue #28023

Copy link
Member

@alimcmaster1 alimcmaster1 Nov 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@oguzhanogreden oguzhanogreden Nov 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the test function... Now that I'm rooting for a single test, I'll leave it as isinstance since I think the intention of test is clearer this way.

[0, 1, 2, 3, 4],
index=[
7606741985629028552,
17876870360202815256,
13106359306506049338,
8991270399732411471,
8991270399732411471,
],
)

tm.assert_equal(
bug.loc[[7606741985629028552, 17876870360202815256]], bug.iloc[[0, 1]]
)


def test_uint_index_not_converted_to_float64():
# https://github.com/pandas-dev/pandas/issues/28279
bug = pd.Series(
[0, 1, 2, 3, 4],
index=[
7606741985629028552,
17876870360202815256,
13106359306506049338,
8991270399732411471,
8991270399732411472,
],
)

assert isinstance(
bug.loc[[7606741985629028552, 17876870360202815256]].index, UInt64Index
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explicitly construct the expected index and use tm.assert_index_equal to verify they're the same:

result = s.loc[[7606741985629028552, 17876870360202815256]].index
expected = UInt64Index([7606741985629028552, 17876870360202815256])
tm.assert_index_equal(result, expected)

I'd rather not do a simple isinstance check here because it doesn't guard against potential precision loss with the values in the index, e.g. if someone makes a change where there's an intermediate coercion to Float64Index:

In [2]: idx = pd.UInt64Index([2**53, 2**53 + 1])

In [3]: idx
Out[3]: UInt64Index([9007199254740992, 9007199254740993], dtype='uint64')

In [4]: pd.UInt64Index(pd.Float64Index(idx))
Out[4]: UInt64Index([9007199254740992, 9007199254740992], dtype='uint64')