-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unstack()
support for non-multiindexed dataframes
#7054
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -902,6 +902,11 @@ def unstack(df, level, fill_value=None): | |
Pivots the specified levels of the index labels of df to the innermost | ||
levels of the columns labels of the result. | ||
|
||
* If the index of ``df`` has multiple levels, returns a ``Dataframe`` with | ||
specified level of the index pivoted to the column levels. | ||
* If the index of ``df`` has single level, returns a ``Series`` with all | ||
column levels pivoted to the index levels. | ||
|
||
Parameters | ||
---------- | ||
df : DataFrame | ||
|
@@ -913,7 +918,7 @@ def unstack(df, level, fill_value=None): | |
|
||
Returns | ||
------- | ||
DataFrame with specified index levels pivoted to column levels | ||
Series or DataFrame | ||
|
||
Examples | ||
-------- | ||
|
@@ -964,6 +969,21 @@ def unstack(df, level, fill_value=None): | |
a | ||
1 5 <NA> 6 <NA> 7 | ||
2 <NA> 8 <NA> 9 <NA> | ||
|
||
Unstacking single level index dataframe: | ||
|
||
>>> df.unstack(['b', 'd']).unstack() | ||
b d a | ||
c 1 a 1 5 | ||
2 <NA> | ||
d 1 <NA> | ||
2 8 | ||
2 b 1 6 | ||
2 <NA> | ||
e 1 <NA> | ||
2 9 | ||
3 a 1 7 | ||
2 <NA> | ||
""" | ||
if fill_value is not None: | ||
raise NotImplementedError("fill_value is not supported.") | ||
|
@@ -972,10 +992,16 @@ def unstack(df, level, fill_value=None): | |
return df | ||
df = df.copy(deep=False) | ||
if not isinstance(df.index, cudf.MultiIndex): | ||
raise NotImplementedError( | ||
"Calling unstack() on a DataFrame without a MultiIndex " | ||
"is not supported" | ||
) | ||
if isinstance(df, cudf.DataFrame): | ||
res = df.T.stack(dropna=False) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this pass the typecasting behavior off to transpose? Should we check the dtypes and possibly error here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems like both There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would support checking here - imagining what happens here from the user perspective, if I get an error trying to In general, I think we try and avoid letting libcudf itself serve an error to the user and favor a more surface level python error, usually when I've managed to actually manifest a libcudf error from the python API it means something is very wrong. |
||
# Result's index is a multiindex | ||
res.index.names = tuple(df.columns.names) + df.index.names | ||
return res | ||
else: | ||
raise NotImplementedError( | ||
"Calling unstack() on a Series without a MultiIndex " | ||
"is not supported" | ||
) | ||
else: | ||
columns = df.index._poplevels(level) | ||
index = df.index | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this example is a little opaque - it's sometimes difficult to visualize exactly what the result of
unstack
should be for even a single level, and here I find it a little hard to connect to dots through the chained operation. I'd recommend an example that starts with a dataframe with a single index and shows the result of unstacking that dataframe into a series instead.