Add `unstack()` support for non-multiindexed dataframes #7054

isVoid · 2020-12-29T23:04:44Z

When unstack() receives a dataframe with "single" index, returns a series to match pandas behavior.

codecov · 2020-12-30T01:12:43Z

Codecov Report

Merging #7054 (57fdd4a) into branch-0.18 (4a1e465) will increase coverage by 0.02%.
The diff coverage is 81.81%.

@@               Coverage Diff               @@
##           branch-0.18    #7054      +/-   ##
===============================================
+ Coverage        82.09%   82.12%   +0.02%     
===============================================
  Files               97       97              
  Lines            16477    16487      +10     
===============================================
+ Hits             13527    13540      +13     
+ Misses            2950     2947       -3

Impacted Files	Coverage Δ
python/cudf/cudf/core/reshape.py	`91.00% <81.81%> (-0.04%)`	⬇️
python/cudf/cudf/io/csv.py	`93.33% <0.00%> (-0.42%)`	⬇️
python/cudf/cudf/_fuzz_testing/fuzzer.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/utils/hash_vocab_utils.py	`100.00% <0.00%> (ø)`
python/cudf/cudf/core/dataframe.py	`90.76% <0.00%> (+0.05%)`	⬆️
python/cudf/cudf/core/abc.py	`91.48% <0.00%> (+4.25%)`	⬆️
python/cudf/cudf/utils/gpu_utils.py	`58.53% <0.00%> (+4.87%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4a1e465...57fdd4a. Read the comment docs.

brandon-b-miller · 2020-12-30T15:34:09Z

python/cudf/cudf/core/reshape.py

+
+    Unstacking single level index dataframe:
+
+    >>> df.unstack(['b', 'd']).unstack()


I think this example is a little opaque - it's sometimes difficult to visualize exactly what the result of unstack should be for even a single level, and here I find it a little hard to connect to dots through the chained operation. I'd recommend an example that starts with a dataframe with a single index and shows the result of unstacking that dataframe into a series instead.

brandon-b-miller · 2020-12-30T15:35:22Z

python/cudf/cudf/core/reshape.py

-            "is not supported"
-        )
+        if isinstance(df, cudf.DataFrame):
+            res = df.T.stack(dropna=False)


Does this pass the typecasting behavior off to transpose? Should we check the dtypes and possibly error here?

It seems like both transpose.pyx and libcudf::transpose.cu checks whether all columns have the same datatype. A clear exception gets raised if the columns are of different types. Should we check again here?

I would support checking here - imagining what happens here from the user perspective, if I get an error trying to unstack a cuDF dataframe, I might wonder why the transpose code is unhappy.

In general, I think we try and avoid letting libcudf itself serve an error to the user and favor a more surface level python error, usually when I've managed to actually manifest a libcudf error from the python API it means something is very wrong.

brandon-b-miller

Couple of questions otherwise LGTM.

brandon-b-miller · 2021-01-04T18:37:18Z

python/cudf/cudf/core/reshape.py

-        )
+        if isinstance(df, cudf.DataFrame):
+            dtype = df._columns[0].dtype
+            if any(


It won't matter much in most cases but I think it's slightly more efficient to use a single loop that raises if df._columns[i].dtype != dtype as opposed to creating a list with all the booleans and then doing an any reduction.

Raised an issue since this comes up in a bunch of places #7067

isVoid · 2021-01-05T18:26:09Z

rerun tests

Initial

193ab17

isVoid requested a review from a team as a code owner December 29, 2020 23:04

isVoid requested review from cwharris and brandon-b-miller December 29, 2020 23:04

isVoid self-assigned this Dec 29, 2020

isVoid added Python Affects Python cuDF API. bug Something isn't working non-breaking Non-breaking change labels Dec 29, 2020

docstrings

d9326be

isVoid added the 3 - Ready for Review Ready for review by team label Dec 30, 2020

brandon-b-miller reviewed Dec 30, 2020

View reviewed changes

brandon-b-miller requested changes Dec 30, 2020

View reviewed changes

clearer example

4545bc9

isVoid mentioned this pull request Dec 30, 2020

[BUG] Index.rename improperly exposed to MultiIndex #7057

Closed

rev: check column dtypes

7c60028

brandon-b-miller reviewed Jan 4, 2021

View reviewed changes

brandon-b-miller approved these changes Jan 4, 2021

View reviewed changes

harrism changed the title ~~Adds unstack() support for non-multiindexed dataframes~~ Add unstack() support for non-multiindexed dataframes Jan 5, 2021

isVoid added 2 commits January 4, 2021 16:50

Efficient datatype checking.

e0c2f2a

Remove stale dataframe check

57fdd4a

isVoid added 6 - Okay to Auto-Merge and removed 3 - Ready for Review Ready for review by team labels Jan 6, 2021

rapids-bot bot merged commit 1930432 into rapidsai:branch-0.18 Jan 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `unstack()` support for non-multiindexed dataframes #7054

Add `unstack()` support for non-multiindexed dataframes #7054

isVoid commented Dec 29, 2020

codecov bot commented Dec 30, 2020 •

edited

Loading

brandon-b-miller Dec 30, 2020

brandon-b-miller Dec 30, 2020

isVoid Dec 30, 2020 •

edited

Loading

brandon-b-miller Jan 4, 2021

brandon-b-miller left a comment

brandon-b-miller Jan 4, 2021

brandon-b-miller Jan 4, 2021

isVoid commented Jan 5, 2021


		Unstacking single level index dataframe:

		>>> df.unstack(['b', 'd']).unstack()

Add unstack() support for non-multiindexed dataframes #7054

Add unstack() support for non-multiindexed dataframes #7054

Conversation

isVoid commented Dec 29, 2020

codecov bot commented Dec 30, 2020 • edited Loading

Codecov Report

brandon-b-miller Dec 30, 2020

Choose a reason for hiding this comment

brandon-b-miller Dec 30, 2020

Choose a reason for hiding this comment

isVoid Dec 30, 2020 • edited Loading

Choose a reason for hiding this comment

brandon-b-miller Jan 4, 2021

Choose a reason for hiding this comment

brandon-b-miller left a comment

Choose a reason for hiding this comment

brandon-b-miller Jan 4, 2021

Choose a reason for hiding this comment

brandon-b-miller Jan 4, 2021

Choose a reason for hiding this comment

isVoid commented Jan 5, 2021

Add `unstack()` support for non-multiindexed dataframes #7054

Add `unstack()` support for non-multiindexed dataframes #7054

codecov bot commented Dec 30, 2020 •

edited

Loading

isVoid Dec 30, 2020 •

edited

Loading