-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Multiindex to Multicolumn #1740
Comments
Fixed by #1542 |
Re-opening this issue and widening the request to handle MultiIndex Series. This issue becomes relevant again because we can now call In a 0.11 nightly runtime container: import cudf
gdf = cudf.DataFrame({'a':range(3), 'b':range(5,8), 'c':range(10,13)})
print(gdf.stack().reset_index())
0 1
0 0 a
1 5 b
2 10 c
3 1 a
4 6 b
5 11 c
6 2 a
7 7 b
8 12 c The MultiIndex is reset but we lose information, when it should be multiple columns named by their import pandas as pd
df = pd.DataFrame({'a':range(3), 'b':range(5,8), 'c':range(10,13)})
print(df.stack().reset_index())
level_0 level_1 0
0 0 a 0
1 0 b 5
2 0 c 10
3 1 a 1
4 1 b 6
5 1 c 11
6 2 a 2
7 2 b 7
8 2 c 12 |
as of 1 import cudf
2 gdf = cudf.DataFrame({'a':range(3), 'b':range(5,8), 'c':range(10,13)})
----> 3 print(gdf.stack().reset_index())
/conda/envs/cudf/lib/python3.7/site-packages/cudf/core/series.py in reset_index(self, drop, inplace)
568 "to create a DataFrame"
569 )
--> 570 return self.to_frame().reset_index(drop=drop)
571 else:
572 if inplace is True:
/conda/envs/cudf/lib/python3.7/site-packages/cudf/core/dataframe.py in reset_index(self, level, drop, inplace, col_level, col_fill)
2456 reversed(names), reversed(index_columns)
2457 ):
-> 2458 result.insert(0, name, index_column)
2459 result.index = RangeIndex(len(self))
2460 if inplace:
/conda/envs/cudf/lib/python3.7/contextlib.py in inner(*args, **kwds)
72 def inner(*args, **kwds):
73 with self._recreate_cm():
---> 74 return func(*args, **kwds)
75 return inner
76
/conda/envs/cudf/lib/python3.7/site-packages/cudf/core/dataframe.py in insert(self, loc, name, value)
2544 num_cols = len(self._data)
2545 if name in self._data:
-> 2546 raise NameError("duplicated column name {!r}".format(name))
2547
2548 if loc < 0:
NameError: duplicated column name 0
In [2]: import pandas as pd
...: df = pd.DataFrame({'a':range(3), 'b':range(5,8), 'c':range(10,13)})
...: print(df.stack().reset_index())
level_0 level_1 0
0 0 a 0
1 0 b 5
2 0 c 10
3 1 a 1
4 1 b 6
5 1 c 11
6 2 a 2
7 2 b 7
8 2 c 12
In [3]: |
Closing. In the current 21.08 nightly, this is now fixed: import cudf
gdf = cudf.DataFrame({'a':range(3), 'b':range(5,8), 'c':range(10,13)})
print(gdf.stack().reset_index())
level_0 level_1 0
0 0 a 0
1 0 b 5
2 0 c 10
3 1 a 1
4 1 b 6
5 1 c 11
6 2 a 2
7 2 b 7
8 2 c 12 import pandas as pd
df = pd.DataFrame({'a':range(3), 'b':range(5,8), 'c':range(10,13)})
print(df.stack().reset_index())
level_0 level_1 0
0 0 a 0
1 0 b 5
2 0 c 10
3 1 a 1
4 1 b 6
5 1 c 11
6 2 a 2
7 2 b 7
8 2 c 12 |
There should be a way to go from a Dataframe with a multi index to a multicolumn Dataframe. In pandas calling
reset_index
on a df with multi-index converts it to a multicolumn Dataframe.The text was updated successfully, but these errors were encountered: