Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: pandas series creation fails with OverflowError when given large integers #36291

Closed
2 of 3 tasks
mdering opened this issue Sep 11, 2020 · 1 comment · Fixed by #36316
Closed
2 of 3 tasks

BUG: pandas series creation fails with OverflowError when given large integers #36291

mdering opened this issue Sep 11, 2020 · 1 comment · Fixed by #36316
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors Series Series data structure
Milestone

Comments

@mdering
Copy link

mdering commented Sep 11, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

>>> import pandas as pd
>>> pd.Series(1000000000000000000000)
0    1000000000000000000000
dtype: object
>>> pd.Series(1000000000000000000000, index = pd.date_range(pd.Timestamp.now().floor("1D"), pd.Timestamp.now(), freq='T'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/matt/opt/anaconda3/lib/python3.7/site-packages/pandas/core/series.py", line 327, in __init__
    data = sanitize_array(data, index, dtype, copy, raise_cast_failure=True)
  File "/Users/matt/opt/anaconda3/lib/python3.7/site-packages/pandas/core/construction.py", line 475, in sanitize_array
    subarr = construct_1d_arraylike_from_scalar(value, len(index), dtype)
  File "/Users/matt/opt/anaconda3/lib/python3.7/site-packages/pandas/core/dtypes/cast.py", line 1555, in construct_1d_arraylike_from_scalar
    subarr.fill(value)
OverflowError: int too big to convert
>>> pd.Series(1000000000000000000000.0, index = pd.date_range(pd.Timestamp.now().floor("1D"), pd.Timestamp.now(), freq='T'))
2020-09-11 00:00:00    1.000000e+21
2020-09-11 00:01:00    1.000000e+21
2020-09-11 00:02:00    1.000000e+21
2020-09-11 00:03:00    1.000000e+21
2020-09-11 00:04:00    1.000000e+21
                           ...
2020-09-11 11:24:00    1.000000e+21
2020-09-11 11:25:00    1.000000e+21
2020-09-11 11:26:00    1.000000e+21
2020-09-11 11:27:00    1.000000e+21
2020-09-11 11:28:00    1.000000e+21
Freq: T, Length: 689, dtype: float64

Problem description

Hi pandas, when creating a new series with very large integers, series creation fails. This is not true if you pass in a float, or if you just pass in one value of a series and no index. the traceback points to pandas code so I'm submitting a bug here.

Expected Output

I would expect this to fail more gracefully or either output a series of object type, or float type. when initializing an array in numpy, it transparently converts it to object type

>>> np.array([1000000000000000000000]*1000).dtype
dtype('O')

Output of pd.show_versions()

INSTALLED VERSIONS

commit : f2ca0a2
python : 3.7.6.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Thu Jun 18 20:49:00 PDT 2020; root:xnu-6153.141.1~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.1
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.2
setuptools : 49.6.0.post20200814
Cython : 0.29.21
pytest : 6.0.1
hypothesis : None
sphinx : 3.2.1
blosc : 1.7.0
feather : None
xlsxwriter : 1.3.3
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.8.0
fastparquet : None
gcsfs : None
matplotlib : 3.3.1
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 1.0.1
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : 1.3.19
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.50.1

@mdering mdering added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 11, 2020
@arw2019
Copy link
Member

arw2019 commented Sep 11, 2020

Confirming that this happens on 1.2 master

Output of pd.show_versions() INSTALLED VERSIONS ------------------ commit : 03c7040 python : 3.8.3.final.0 python-bits : 64 OS : Linux OS-release : 5.4.0-47-generic Version : #51-Ubuntu SMP Fri Sep 4 19:50:52 UTC 2020 machine : x86_64 processor : byteorder : little LC_ALL : C.UTF-8 LANG : C.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.2.0.dev0+318.g03c704087
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.1.0.post20200704
Cython : 0.29.21
pytest : 5.4.3
hypothesis : 5.19.0
sphinx : 3.1.1
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : 0.4.0
gcsfs : 0.6.2
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : 0.17.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.50.1

@MarcoGorelli MarcoGorelli removed the Needs Triage Issue that has not been reviewed by a pandas team member label Sep 12, 2020
@dsaxton dsaxton added Constructors Series/DataFrame/Index/pd.array Constructors Series Series data structure labels Sep 12, 2020
@jreback jreback added this to the 1.1.3 milestone Sep 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors Series Series data structure
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants