BUG: pandas.Index takes multidimensional array as input #20285

csfarkas · 2018-03-11T16:40:38Z

Code Sample, a copy-pastable example if possible

idx = pd.Index(data=[[1, 2], [1, 2], [2, 3]])

# some cases where this causes error:
idx.get_duplicates()
idx.drop_duplicates()

Problem description

According to the documentation, pandas.Index takes a 1-dimensional array-like data as input, which is clearly violated in the example.

Expected Output

Option 1: pandas.Index should throw an error in this case.
Option 2: the documentation of pandas.Index should be updated. In this case, methods of the Index class should be checked, since nor get_duplicates, nor drop_duplicates are prepared for this kind of input.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: c818a22
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-36-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0.dev0+484.gc818a22
pytest: 3.4.2
pip: 9.0.1
setuptools: 38.5.1
Cython: 0.27.3
numpy: 1.14.1
scipy: 1.0.0
pyarrow: 0.8.0
xarray: 0.10.1
IPython: 6.2.1
sphinx: 1.7.1
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.2.0
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.4
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: 0.1.3
fastparquet: 0.1.4
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

toobaz · 2019-06-29T15:31:12Z

Definitely option 1, as we also want to avoid

In [2]: idx = pd.Index(data=[[1, 2], [1, 2], [2, 3]])                                                                                                                                                                                         

In [3]: id(idx)                                                                                                                                                                                                                               
Out[3]: 139839285680560

In [4]: idx[0][0] = 1000                                                                                                                                                                                                                      

In [5]: idx                                                                                                                                                                                                                                   
Out[5]: Index([[1000, 2], [1, 2], [2, 3]], dtype='object')

In [6]: id(idx)                                                                                                                                                                                                                               
Out[6]: 139839285680560

toobaz · 2019-06-29T15:42:36Z

... unless we consider option 3: create a MultiIndex, as in

In [2]: pd.Index(data=[(1, 2), (1, 2), (2, 3)])                                                                                                                                                                                               
Out[2]: 
MultiIndex([(1, 2),
            (1, 2),
            (2, 3)],
           )

In any case, xref: #17246

mroeschke added Indexing Related to indexing on series/frames, not to indexes themselves Error Reporting Incorrect or improved errors from pandas labels Jan 13, 2019

toobaz added Index Related to the Index class or subclasses and removed Indexing Related to indexing on series/frames, not to indexes themselves labels Jun 29, 2019

jbrockmendel added the Constructors Series/DataFrame/Index/pd.array Constructors label Dec 31, 2019

jbrockmendel mentioned this issue Dec 31, 2019

BUG: validate Index data is 1D + deprecate multi-dim indexing #30588

Merged

6 tasks

mroeschke added the Bug label May 5, 2020

asdf8601 mentioned this issue Nov 15, 2020

BUG: use hstack and workarounds pandas issues asdf8601/IneqPy#17

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: pandas.Index takes multidimensional array as input #20285

BUG: pandas.Index takes multidimensional array as input #20285

csfarkas commented Mar 11, 2018 •

edited

Loading

INSTALLED VERSIONS

toobaz commented Jun 29, 2019

toobaz commented Jun 29, 2019

BUG: pandas.Index takes multidimensional array as input #20285

BUG: pandas.Index takes multidimensional array as input #20285

Comments

csfarkas commented Mar 11, 2018 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

toobaz commented Jun 29, 2019

toobaz commented Jun 29, 2019

csfarkas commented Mar 11, 2018 •

edited

Loading

Output of `pd.show_versions()`