Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct type inference for UInt64Index during access #29420

Merged
merged 13 commits into from
Nov 27, 2019
3 changes: 2 additions & 1 deletion doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -345,7 +345,8 @@ Numeric
- Improved error message when using `frac` > 1 and `replace` = False (:issue:`27451`)
- Bug in numeric indexes resulted in it being possible to instantiate an :class:`Int64Index`, :class:`UInt64Index`, or :class:`Float64Index` with an invalid dtype (e.g. datetime-like) (:issue:`29539`)
- Bug in :class:`UInt64Index` precision loss while constructing from a list with values in the ``np.uint64`` range (:issue:`29526`)
-
- Bug in :class:`NumericIndex` construction that caused indexing to fail when integers in the ``np.uint64`` range were used (:issue:`28023`)
- Bug in :class:`NumericIndex` construction that caused :class:`UInt64Index` to be casted to :class:`Float64Index` when integers in the ``np.uint64`` range were used to index a :class:`DataFrame` (:issue:`28279`)

Conversion
^^^^^^^^^^
Expand Down
18 changes: 10 additions & 8 deletions pandas/core/indexes/numeric.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

import numpy as np

from pandas._libs import index as libindex
from pandas._libs import index as libindex, lib
from pandas.util._decorators import Appender, cache_readonly

from pandas.core.dtypes.cast import astype_nansafe
Expand Down Expand Up @@ -331,13 +331,15 @@ def _convert_scalar_indexer(self, key, kind=None):

@Appender(_index_shared_docs["_convert_arr_indexer"])
def _convert_arr_indexer(self, keyarr):
# Cast the indexer to uint64 if possible so
# that the values returned from indexing are
# also uint64.
keyarr = com.asarray_tuplesafe(keyarr)
if is_integer_dtype(keyarr):
return com.asarray_tuplesafe(keyarr, dtype=np.uint64)
return keyarr
# Cast the indexer to uint64 if possible so that the values returned
# from indexing are also uint64.
dtype = None
if is_integer_dtype(keyarr) or (
lib.infer_dtype(keyarr, skipna=False) == "integer"
):
dtype = np.uint64

return com.asarray_tuplesafe(keyarr, dtype=dtype)

@Appender(_index_shared_docs["_convert_index_indexer"])
def _convert_index_indexer(self, keyarr):
Expand Down
26 changes: 26 additions & 0 deletions pandas/tests/indexes/test_numeric.py
Original file line number Diff line number Diff line change
Expand Up @@ -1209,3 +1209,29 @@ def test_range_float_union_dtype():

result = other.union(index)
tm.assert_index_equal(result, expected)


def test_uint_index_does_not_convert_to_float64():
# https://github.com/pandas-dev/pandas/issues/28279
# https://github.com/pandas-dev/pandas/issues/28023
series = pd.Series(
[0, 1, 2, 3, 4, 5],
index=[
7606741985629028552,
17876870360202815256,
17876870360202815256,
13106359306506049338,
8991270399732411471,
8991270399732411472,
],
)

result = series.loc[[7606741985629028552, 17876870360202815256]]

expected = UInt64Index(
[7606741985629028552, 17876870360202815256, 17876870360202815256],
dtype="uint64",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explicitly construct the expected index and use tm.assert_index_equal to verify they're the same:

result = s.loc[[7606741985629028552, 17876870360202815256]].index
expected = UInt64Index([7606741985629028552, 17876870360202815256])
tm.assert_index_equal(result, expected)

I'd rather not do a simple isinstance check here because it doesn't guard against potential precision loss with the values in the index, e.g. if someone makes a change where there's an intermediate coercion to Float64Index:

In [2]: idx = pd.UInt64Index([2**53, 2**53 + 1])

In [3]: idx
Out[3]: UInt64Index([9007199254740992, 9007199254740993], dtype='uint64')

In [4]: pd.UInt64Index(pd.Float64Index(idx))
Out[4]: UInt64Index([9007199254740992, 9007199254740992], dtype='uint64')

tm.assert_index_equal(result.index, expected)

tm.assert_equal(result, series[:3])