Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-unique integers coerced to float during UInt64Index creation with explicit #29526

Closed
oguzhanogreden opened this issue Nov 10, 2019 · 1 comment · Fixed by #29529
Closed
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Index Related to the Index class or subclasses
Milestone

Comments

@oguzhanogreden
Copy link
Contributor

Noticed this thanks to @jschendel 's comment over here. I'll use this to learn more about data types and try to suggest a reasonable solution.

Somewhat relevant: #15832, #18400

Code Sample, a copy-pastable example if possible

index1 = 7606741985629028552
index2 = 17876870360202815256

UInt64Index([index1, index2])[0]
# Returns: 7606741985629028352

# These will return the input value:
UInt64Index([index1])[0]  
UInt64Index([index2])[0]
UInt64Index([index1, index1])[0]

Problem description

The numpy array creation here coerces to float, while it's possible to specify dtype and prevent this behavior.

Expected Output

UInt64Index contains precisely the input values.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : None.None

pandas : 0.25.3
numpy : 1.16.4
pytz : 2019.1
dateutil : 2.8.0
pip : 19.1.1
setuptools : 41.0.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.3.4
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.5.0
pandas_datareader: None
bs4 : 4.7.1
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.3.4
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.0
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

Replicates on current master as well.

@jschendel
Copy link
Member

Yeah, the root cause of this looks to be numpy's inference rules.

From what I can tell numpy looks to be inferring float64 instead of uint64 when both of the following conditions are met:

  • input data contains a number supported by both int64 and uint64 (0 to 2**63 - 1)
  • input data contains a number supported by only uint64 (2**63 and above)
In [1]: import numpy as np; np.__version__
Out[1]: '1.17.3'

In [2]: np.array([1, 2**63])
Out[2]: array([1.00000000e+00, 9.22337204e+18])

This works fine if uint64 is explicitly specified:

In [3]: np.array([1, 2**63], dtype="uint64")
Out[3]: array([                  1, 9223372036854775808], dtype=uint64)

If both values are in the int64 range the dtype is correctly inferred as int64:

In [4]: np.array([1, 2**63 - 1])
Out[4]: array([                  1, 9223372036854775807])

In [5]: _.dtype
Out[5]: dtype('int64')

If both values are in the uint64-only range the dtype correctly inferred as uint64:

In [6]: np.array([2**63, 2**63 + 1])
Out[6]: array([9223372036854775808, 9223372036854775809], dtype=uint64)

The intermediate conversion to float can cause precision loss starting at 2**53 + 1, which is the first integer that can't be represented exactly by float64:

In [7]: np.float64(2**53), np.float64(2**53 + 1)
Out[7]: (9007199254740992.0, 9007199254740992.0)

I'll look into this on the numpy side to see if this is the expected inference behavior or a bug.

@jschendel jschendel added Bug Dtype Conversions Unexpected or buggy dtype conversions Index Related to the Index class or subclasses labels Nov 11, 2019
@jschendel jschendel added this to the 1.0 milestone Nov 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Index Related to the Index class or subclasses
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants