[FEA] Multiindex to Multicolumn #1740

ayushdg · 2019-05-14T21:58:19Z

There should be a way to go from a Dataframe with a multi index to a multicolumn Dataframe. In pandas calling reset_index on a df with multi-index converts it to a multicolumn Dataframe.

The text was updated successfully, but these errors were encountered:

thomcom · 2019-06-14T13:20:34Z

Fixed by #1542

beckernick · 2019-10-28T15:33:22Z

Re-opening this issue and widening the request to handle MultiIndex Series. This issue becomes relevant again because we can now call stack on DataFrames, which results in a Series.

In a 0.11 nightly runtime container:

import cudf
gdf = cudf.DataFrame({'a':range(3), 'b':range(5,8), 'c':range(10,13)})
print(gdf.stack().reset_index())
    0  1
0   0  a
1   5  b
2  10  c
3   1  a
4   6  b
5  11  c
6   2  a
7   7  b
8  12  c

The MultiIndex is reset but we lose information, when it should be multiple columns named by their level + the Series values:

import pandas as pd
df = pd.DataFrame({'a':range(3), 'b':range(5,8), 'c':range(10,13)})
print(df.stack().reset_index())
   level_0 level_1   0
0        0       a   0
1        0       b   5
2        0       c  10
3        1       a   1
4        1       b   6
5        1       c  11
6        2       a   2
7        2       b   7
8        2       c  12

galipremsagar · 2020-06-01T03:49:53Z

as of 0.15, this issue exists.

      1 import cudf
      2 gdf = cudf.DataFrame({'a':range(3), 'b':range(5,8), 'c':range(10,13)})
----> 3 print(gdf.stack().reset_index())

/conda/envs/cudf/lib/python3.7/site-packages/cudf/core/series.py in reset_index(self, drop, inplace)
    568                     "to create a DataFrame"
    569                 )
--> 570             return self.to_frame().reset_index(drop=drop)
    571         else:
    572             if inplace is True:

/conda/envs/cudf/lib/python3.7/site-packages/cudf/core/dataframe.py in reset_index(self, level, drop, inplace, col_level, col_fill)
   2456                 reversed(names), reversed(index_columns)
   2457             ):
-> 2458                 result.insert(0, name, index_column)
   2459         result.index = RangeIndex(len(self))
   2460         if inplace:

/conda/envs/cudf/lib/python3.7/contextlib.py in inner(*args, **kwds)
     72         def inner(*args, **kwds):
     73             with self._recreate_cm():
---> 74                 return func(*args, **kwds)
     75         return inner
     76 

/conda/envs/cudf/lib/python3.7/site-packages/cudf/core/dataframe.py in insert(self, loc, name, value)
   2544         num_cols = len(self._data)
   2545         if name in self._data:
-> 2546             raise NameError("duplicated column name {!r}".format(name))
   2547 
   2548         if loc < 0:

NameError: duplicated column name 0

In [2]: import pandas as pd 
   ...: df = pd.DataFrame({'a':range(3), 'b':range(5,8), 'c':range(10,13)}) 
   ...: print(df.stack().reset_index())                                                    
   level_0 level_1   0
0        0       a   0
1        0       b   5
2        0       c  10
3        1       a   1
4        1       b   6
5        1       c  11
6        2       a   2
7        2       b   7
8        2       c  12

In [3]:

beckernick · 2021-07-23T20:38:51Z

Closing. In the current 21.08 nightly, this is now fixed:

import cudf
gdf = cudf.DataFrame({'a':range(3), 'b':range(5,8), 'c':range(10,13)})
print(gdf.stack().reset_index())
   level_0 level_1   0
0        0       a   0
1        0       b   5
2        0       c  10
3        1       a   1
4        1       b   6
5        1       c  11
6        2       a   2
7        2       b   7
8        2       c  12

import pandas as pd
df = pd.DataFrame({'a':range(3), 'b':range(5,8), 'c':range(10,13)})
print(df.stack().reset_index())
   level_0 level_1   0
0        0       a   0
1        0       b   5
2        0       c  10
3        1       a   1
4        1       b   6
5        1       c  11
6        2       a   2
7        2       b   7
8        2       c  12

ayushdg added Needs Triage Need team to review and classify feature request New feature or request labels May 14, 2019

kkraus14 added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels May 15, 2019

ayushdg mentioned this issue May 20, 2019

[BUG] reset_index on multi-column groupby results in an error #1801

Closed

thomcom mentioned this issue Jun 14, 2019

[REVIEW] Python method and bindings for to_csv #1542

Merged

18 tasks

kkraus14 closed this as completed Jul 3, 2019

beckernick reopened this Oct 28, 2019

beckernick closed this as completed Jul 23, 2021

beckernick added this to the Pandas API Alignment and Coverage milestone Jul 23, 2021

galipremsagar linked a pull request Jul 23, 2021 that will close this issue

[REVIEW] Fix issues with MultiIndex in dropna, stack & reset_index #8753

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Multiindex to Multicolumn #1740

[FEA] Multiindex to Multicolumn #1740

ayushdg commented May 14, 2019

thomcom commented Jun 14, 2019

beckernick commented Oct 28, 2019 •

edited

Loading

galipremsagar commented Jun 1, 2020

beckernick commented Jul 23, 2021

[FEA] Multiindex to Multicolumn #1740

[FEA] Multiindex to Multicolumn #1740

Comments

ayushdg commented May 14, 2019

thomcom commented Jun 14, 2019

beckernick commented Oct 28, 2019 • edited Loading

galipremsagar commented Jun 1, 2020

beckernick commented Jul 23, 2021

beckernick commented Oct 28, 2019 •

edited

Loading