Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Multiindex to Multicolumn #1740

Closed
ayushdg opened this issue May 14, 2019 · 4 comments · Fixed by #8753
Closed

[FEA] Multiindex to Multicolumn #1740

ayushdg opened this issue May 14, 2019 · 4 comments · Fixed by #8753
Labels
feature request New feature or request Python Affects Python cuDF API.

Comments

@ayushdg
Copy link
Member

ayushdg commented May 14, 2019

There should be a way to go from a Dataframe with a multi index to a multicolumn Dataframe. In pandas calling reset_index on a df with multi-index converts it to a multicolumn Dataframe.

@ayushdg ayushdg added Needs Triage Need team to review and classify feature request New feature or request labels May 14, 2019
@kkraus14 kkraus14 added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels May 15, 2019
@thomcom
Copy link
Contributor

thomcom commented Jun 14, 2019

Fixed by #1542

@beckernick
Copy link
Member

beckernick commented Oct 28, 2019

Re-opening this issue and widening the request to handle MultiIndex Series. This issue becomes relevant again because we can now call stack on DataFrames, which results in a Series.

In a 0.11 nightly runtime container:

import cudf
gdf = cudf.DataFrame({'a':range(3), 'b':range(5,8), 'c':range(10,13)})
print(gdf.stack().reset_index())
    0  1
0   0  a
1   5  b
2  10  c
3   1  a
4   6  b
5  11  c
6   2  a
7   7  b
8  12  c

The MultiIndex is reset but we lose information, when it should be multiple columns named by their level + the Series values:

import pandas as pd
df = pd.DataFrame({'a':range(3), 'b':range(5,8), 'c':range(10,13)})
print(df.stack().reset_index())
   level_0 level_1   0
0        0       a   0
1        0       b   5
2        0       c  10
3        1       a   1
4        1       b   6
5        1       c  11
6        2       a   2
7        2       b   7
8        2       c  12

@beckernick beckernick reopened this Oct 28, 2019
@galipremsagar
Copy link
Contributor

as of 0.15, this issue exists.

      1 import cudf
      2 gdf = cudf.DataFrame({'a':range(3), 'b':range(5,8), 'c':range(10,13)})
----> 3 print(gdf.stack().reset_index())

/conda/envs/cudf/lib/python3.7/site-packages/cudf/core/series.py in reset_index(self, drop, inplace)
    568                     "to create a DataFrame"
    569                 )
--> 570             return self.to_frame().reset_index(drop=drop)
    571         else:
    572             if inplace is True:

/conda/envs/cudf/lib/python3.7/site-packages/cudf/core/dataframe.py in reset_index(self, level, drop, inplace, col_level, col_fill)
   2456                 reversed(names), reversed(index_columns)
   2457             ):
-> 2458                 result.insert(0, name, index_column)
   2459         result.index = RangeIndex(len(self))
   2460         if inplace:

/conda/envs/cudf/lib/python3.7/contextlib.py in inner(*args, **kwds)
     72         def inner(*args, **kwds):
     73             with self._recreate_cm():
---> 74                 return func(*args, **kwds)
     75         return inner
     76 

/conda/envs/cudf/lib/python3.7/site-packages/cudf/core/dataframe.py in insert(self, loc, name, value)
   2544         num_cols = len(self._data)
   2545         if name in self._data:
-> 2546             raise NameError("duplicated column name {!r}".format(name))
   2547 
   2548         if loc < 0:

NameError: duplicated column name 0

In [2]: import pandas as pd 
   ...: df = pd.DataFrame({'a':range(3), 'b':range(5,8), 'c':range(10,13)}) 
   ...: print(df.stack().reset_index())                                                    
   level_0 level_1   0
0        0       a   0
1        0       b   5
2        0       c  10
3        1       a   1
4        1       b   6
5        1       c  11
6        2       a   2
7        2       b   7
8        2       c  12

In [3]:                                                                                    

@beckernick
Copy link
Member

Closing. In the current 21.08 nightly, this is now fixed:

import cudf
gdf = cudf.DataFrame({'a':range(3), 'b':range(5,8), 'c':range(10,13)})
print(gdf.stack().reset_index())
   level_0 level_1   0
0        0       a   0
1        0       b   5
2        0       c  10
3        1       a   1
4        1       b   6
5        1       c  11
6        2       a   2
7        2       b   7
8        2       c  12
import pandas as pd
df = pd.DataFrame({'a':range(3), 'b':range(5,8), 'c':range(10,13)})
print(df.stack().reset_index())
   level_0 level_1   0
0        0       a   0
1        0       b   5
2        0       c  10
3        1       a   1
4        1       b   6
5        1       c  11
6        2       a   2
7        2       b   7
8        2       c  12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants