Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: pandas.Index takes multidimensional array as input #20285

Open
csfarkas opened this issue Mar 11, 2018 · 2 comments
Open

BUG: pandas.Index takes multidimensional array as input #20285

csfarkas opened this issue Mar 11, 2018 · 2 comments
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors Error Reporting Incorrect or improved errors from pandas Index Related to the Index class or subclasses

Comments

@csfarkas
Copy link
Contributor

csfarkas commented Mar 11, 2018

Code Sample, a copy-pastable example if possible

idx = pd.Index(data=[[1, 2], [1, 2], [2, 3]])

# some cases where this causes error:
idx.get_duplicates()
idx.drop_duplicates()

Problem description

According to the documentation, pandas.Index takes a 1-dimensional array-like data as input, which is clearly violated in the example.

Expected Output

Option 1: pandas.Index should throw an error in this case.
Option 2: the documentation of pandas.Index should be updated. In this case, methods of the Index class should be checked, since nor get_duplicates, nor drop_duplicates are prepared for this kind of input.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: c818a22
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-36-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0.dev0+484.gc818a22
pytest: 3.4.2
pip: 9.0.1
setuptools: 38.5.1
Cython: 0.27.3
numpy: 1.14.1
scipy: 1.0.0
pyarrow: 0.8.0
xarray: 0.10.1
IPython: 6.2.1
sphinx: 1.7.1
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.2.0
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.4
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: 0.1.3
fastparquet: 0.1.4
pandas_gbq: None
pandas_datareader: None

@mroeschke mroeschke added Indexing Related to indexing on series/frames, not to indexes themselves Error Reporting Incorrect or improved errors from pandas labels Jan 13, 2019
@toobaz
Copy link
Member

toobaz commented Jun 29, 2019

Definitely option 1, as we also want to avoid

In [2]: idx = pd.Index(data=[[1, 2], [1, 2], [2, 3]])                                                                                                                                                                                         

In [3]: id(idx)                                                                                                                                                                                                                               
Out[3]: 139839285680560

In [4]: idx[0][0] = 1000                                                                                                                                                                                                                      

In [5]: idx                                                                                                                                                                                                                                   
Out[5]: Index([[1000, 2], [1, 2], [2, 3]], dtype='object')

In [6]: id(idx)                                                                                                                                                                                                                               
Out[6]: 139839285680560

@toobaz toobaz added Index Related to the Index class or subclasses and removed Indexing Related to indexing on series/frames, not to indexes themselves labels Jun 29, 2019
@toobaz
Copy link
Member

toobaz commented Jun 29, 2019

... unless we consider option 3: create a MultiIndex, as in

In [2]: pd.Index(data=[(1, 2), (1, 2), (2, 3)])                                                                                                                                                                                               
Out[2]: 
MultiIndex([(1, 2),
            (1, 2),
            (2, 3)],
           )

In any case, xref: #17246

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors Error Reporting Incorrect or improved errors from pandas Index Related to the Index class or subclasses
Projects
None yet
Development

No branches or pull requests

4 participants