BUG: Fix groupby over a CategoricalIndex in axis=1 #18525

ekisslinger · 2017-11-27T18:02:01Z

closes BUG: groupby over a CategoricalIndex in axis=1 #18432
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
add whatsnew entry

gfyoung · 2017-11-27T18:31:35Z

pandas/core/groupby.py

@@ -2859,7 +2859,7 @@ def is_in_obj(gpr):
        else:
            in_axis, name = False, None

-        if is_categorical_dtype(gpr) and len(gpr) != len(obj):
+        if is_categorical_dtype(gpr) and len(gpr) != obj.shape[axis]:
            raise ValueError("Categorical dtype grouper must "
                             "have len(grouper) == len(data)")


In light of the change in the check, perhaps this error message should change.

Good point. I'll change it to "Grouper and axis must be same length".

put in the actual len(grouper) and len(axis) as well

gfyoung · 2017-11-27T18:31:58Z

@ekisslinger : Don't forget to add a whatsnew for 0.22.0

ekisslinger · 2017-11-27T18:55:17Z

Any chance this fix can be added to 0.21.1?

codecov · 2017-11-27T19:19:19Z

Codecov Report

Merging #18525 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #18525      +/-   ##
==========================================
+ Coverage   91.32%   91.33%   +<.01%     
==========================================
  Files         163      164       +1     
  Lines       49798    49819      +21     
==========================================
+ Hits        45479    45503      +24     
+ Misses       4319     4316       -3

Flag	Coverage Δ
#multiple	`89.13% <100%> (+0.02%)`	⬆️
#single	`40.8% <0%> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/groupby.py	`92.03% <100%> (ø)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.81% <0%> (-0.1%)`	⬇️
pandas/util/_test_decorators.py	`95.23% <0%> (ø)`
pandas/core/indexes/datetimes.py	`95.71% <0%> (+0.09%)`	⬆️
pandas/plotting/_converter.py	`65.25% <0%> (+1.81%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 262e8ff...2a11e78. Read the comment docs.

codecov · 2017-11-27T19:19:32Z

Codecov Report

Merging #18525 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #18525      +/-   ##
==========================================
+ Coverage   91.32%   91.33%   +<.01%     
==========================================
  Files         163      164       +1     
  Lines       49798    49819      +21     
==========================================
+ Hits        45479    45503      +24     
+ Misses       4319     4316       -3

Flag	Coverage Δ
#multiple	`89.13% <100%> (+0.02%)`	⬆️
#single	`40.8% <0%> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/groupby.py	`92.03% <100%> (ø)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.81% <0%> (-0.1%)`	⬇️
pandas/util/_test_decorators.py	`95.23% <0%> (ø)`
pandas/core/indexes/datetimes.py	`95.71% <0%> (+0.09%)`	⬆️
pandas/plotting/_converter.py	`65.25% <0%> (+1.81%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 262e8ff...2a11e78. Read the comment docs.

codecov · 2017-11-27T19:19:49Z

Codecov Report

Merging #18525 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #18525      +/-   ##
==========================================
- Coverage   91.35%   91.33%   -0.02%     
==========================================
  Files         164      164              
  Lines       49802    49802              
==========================================
- Hits        45496    45487       -9     
- Misses       4306     4315       +9

Flag	Coverage Δ
#multiple	`89.13% <100%> (ø)`	⬆️
#single	`40.81% <0%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/groupby.py	`92.04% <100%> (ø)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.81% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7627cca...19f6041. Read the comment docs.

gfyoung · 2017-11-27T19:22:29Z

Any chance this fix can be added 0.21.1?

@jreback @jorisvandenbossche : Thoughts?

jreback · 2017-11-27T19:26:05Z

will look in a bit

jreback · 2017-11-27T20:56:07Z

this would be ok for 0.21.1, pls add a whatsnew note

jreback · 2017-11-27T20:55:18Z

pandas/core/groupby.py

@@ -2859,7 +2859,7 @@ def is_in_obj(gpr):
        else:
            in_axis, name = False, None

-        if is_categorical_dtype(gpr) and len(gpr) != len(obj):
+        if is_categorical_dtype(gpr) and len(gpr) != obj.shape[axis]:
            raise ValueError("Categorical dtype grouper must "
                             "have len(grouper) == len(data)")


put in the actual len(grouper) and len(axis) as well

jreback · 2017-11-27T20:55:49Z

pandas/tests/groupby/test_categorical.py

@@ -191,6 +191,25 @@ def test_groupby_categorical_index(self):
                [0, 1, 2, 3], levels, ordered=True), name='cats')
        assert_frame_equal(result, expected)

+    def test_groupby_categorical_columns_index(self):


move this to pandas/tests/groupby/test_grouper.py, which is where things like this are tested

i can't find that file...

sorry, test_grouping.py (move both this version and the transposed)

jreback · 2017-11-27T20:58:09Z

also add the transposed version of this in your test (which works now)

ekisslinger · 2017-11-28T17:06:02Z

I think test_groupby_categorical_index() in pandas/tests/groupby/test_categorical.py tests the transposed version (i.e. group over axis 0). Here is the relevant code:

   def test_groupby_categorical_index(self):

        s = np.random.RandomState(12345)
        levels = ['foo', 'bar', 'baz', 'qux']
        codes = s.randint(0, 4, size=20)
        cats = Categorical.from_codes(codes, levels, ordered=True)
        df = DataFrame(
            np.repeat(
                np.arange(20), 4).reshape(-1, 4), columns=list('abcd'))
        df['cats'] = cats

        # with a cat index
        result = df.set_index('cats').groupby(level=0).sum()
        expected = df[list('abcd')].groupby(cats.codes).sum()
        expected.index = CategoricalIndex(
            Categorical.from_codes(
                [0, 1, 2, 3], levels, ordered=True), name='cats')
        assert_frame_equal(result, expected)

ekisslinger · 2017-11-28T17:57:33Z

I updated the bug fix branch. The changes are:

Add whatsnew entry to 0.21.1
Fix error message
Add multi-index version to associated unit test

jreback · 2017-11-29T00:11:21Z

doc/source/whatsnew/v0.21.1.txt

@@ -137,6 +137,7 @@ Categorical
 - Error messages in the testing module have been improved when items have different ``CategoricalDtype`` (:issue:`18069`)
 - ``CategoricalIndex`` can now correctly take a ``pd.api.types.CategoricalDtype`` as its dtype (:issue:`18116`)
 - Bug in ``Categorical.unique()`` returning read-only ``codes``  array when all categories were ``NaN`` (:issue:`18051`)
+- Bug when grouping over a ``CategoricalIndex`` in axis=1 (:issue:`18432`)


DataFrame.groupby(axis=1) with a CategoricalIndex

jreback · 2017-11-29T00:11:48Z

pandas/core/groupby.py

-                             "have len(grouper) == len(data)")
+        if is_categorical_dtype(gpr) and len(gpr) != obj.shape[axis]:
+            raise ValueError(
+                ("Length of grouper ({0}) and axis ({1}) must be same length"


use kwargs rather than positional formatters

jreback · 2017-11-29T00:12:29Z

pandas/tests/groupby/test_categorical.py

@@ -191,6 +191,25 @@ def test_groupby_categorical_index(self):
                [0, 1, 2, 3], levels, ordered=True), name='cats')
        assert_frame_equal(result, expected)

+    def test_groupby_categorical_columns_index(self):


sorry, test_grouping.py (move both this version and the transposed)

jreback · 2017-11-29T00:12:50Z

pandas/tests/groupby/test_categorical.py

@@ -191,6 +191,45 @@ def test_groupby_categorical_index(self):
                [0, 1, 2, 3], levels, ordered=True), name='cats')
        assert_frame_equal(result, expected)

+    def test_groupby_categorical_columns_index(self):
+        # GH18432
+        s = np.random.RandomState(0)


not necessary to set a random key here

ekisslinger · 2017-11-29T01:02:07Z

i made the requested changes

jreback · 2017-11-29T12:11:11Z

pandas/tests/groupby/test_grouping.py

+        columns = ['A', 'B', 'A', 'B']
+        categories = ['B', 'A']
+        cat_columns = CategoricalIndex(columns,
+                                       categories=categories,


can you change this test to instead of using random data, uses integer as the data, then directly construct the result frame

Yes. I made the change. I think it is what you had in mind and I agree it is more straightforward this way.

closes GH18432 Add multi-index columns test to test_groupby_categorical_columns_index() Add whatsnew for GH18432 bug fix Fix ValueError text for GH18432 bug fix Update whatsnew text Use kwargs instead of positional format params Move test_groupby_categorical_columns_index() to pandas/tests/groupby/test_grouping.py Directly construct expected dataframe in test_groupby_categorical_index_and_columns()

ekisslinger · 2017-11-30T14:59:08Z

In case you didn't see the last message, as it is a bit hidden, I made the last change which was to use integer instead of random data and directly construct the dataframe.

jreback · 2017-11-30T15:11:52Z

thanks @ekisslinger

(cherry picked from commit 5da3759)

gfyoung added Bug Categorical Categorical Data Type Groupby Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 27, 2017

gfyoung reviewed Nov 27, 2017

View reviewed changes

jreback requested changes Nov 27, 2017

View reviewed changes

ekisslinger force-pushed the fix-groupby-over-columns-categoricalindex branch from 2a11e78 to a05147f Compare November 28, 2017 17:50

jreback requested changes Nov 29, 2017

View reviewed changes

ekisslinger force-pushed the fix-groupby-over-columns-categoricalindex branch from a05147f to d7640e8 Compare November 29, 2017 00:54

jreback requested changes Nov 29, 2017

View reviewed changes

ekisslinger force-pushed the fix-groupby-over-columns-categoricalindex branch from d7640e8 to 19f6041 Compare November 29, 2017 15:25

jreback added this to the 0.21.1 milestone Nov 30, 2017

jreback added the Needs Backport label Nov 30, 2017

jreback approved these changes Nov 30, 2017

View reviewed changes

jreback merged commit 5da3759 into pandas-dev:master Nov 30, 2017

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Dec 8, 2017

BUG: Fix groupby over a CategoricalIndex in axis=1 (pandas-dev#18525)

68fd85b

(cherry picked from commit 5da3759)

TomAugspurger pushed a commit that referenced this pull request Dec 11, 2017

BUG: Fix groupby over a CategoricalIndex in axis=1 (#18525)

8944daf

(cherry picked from commit 5da3759)

TomAugspurger removed the Needs Backport label Dec 11, 2017

stanwest mentioned this pull request Oct 25, 2021

Group by a categorical Series of unequal length #44180

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Fix groupby over a CategoricalIndex in axis=1 #18525

BUG: Fix groupby over a CategoricalIndex in axis=1 #18525

ekisslinger commented Nov 27, 2017 •

edited

Loading

gfyoung Nov 27, 2017

ekisslinger Nov 27, 2017

jreback Nov 27, 2017

gfyoung commented Nov 27, 2017

ekisslinger commented Nov 27, 2017 •

edited

Loading

codecov bot commented Nov 27, 2017

codecov bot commented Nov 27, 2017 •

edited

Loading

codecov bot commented Nov 27, 2017 •

edited

Loading

gfyoung commented Nov 27, 2017

jreback commented Nov 27, 2017

jreback commented Nov 27, 2017

jreback Nov 27, 2017

jreback Nov 27, 2017

ekisslinger Nov 27, 2017

jreback Nov 29, 2017

jreback commented Nov 27, 2017

ekisslinger commented Nov 28, 2017

ekisslinger commented Nov 28, 2017

jreback Nov 29, 2017

jreback Nov 29, 2017

jreback Nov 29, 2017

jreback Nov 29, 2017

ekisslinger commented Nov 29, 2017

jreback Nov 29, 2017 •

edited

Loading

ekisslinger Nov 29, 2017

ekisslinger commented Nov 30, 2017

jreback commented Nov 30, 2017

BUG: Fix groupby over a CategoricalIndex in axis=1 #18525

BUG: Fix groupby over a CategoricalIndex in axis=1 #18525

Conversation

ekisslinger commented Nov 27, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung commented Nov 27, 2017

ekisslinger commented Nov 27, 2017 • edited Loading

codecov bot commented Nov 27, 2017

Codecov Report

codecov bot commented Nov 27, 2017 • edited Loading

Codecov Report

codecov bot commented Nov 27, 2017 • edited Loading

Codecov Report

gfyoung commented Nov 27, 2017

jreback commented Nov 27, 2017

jreback commented Nov 27, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Nov 27, 2017

ekisslinger commented Nov 28, 2017

ekisslinger commented Nov 28, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ekisslinger commented Nov 29, 2017

jreback Nov 29, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ekisslinger commented Nov 30, 2017

jreback commented Nov 30, 2017

ekisslinger commented Nov 27, 2017 •

edited

Loading

ekisslinger commented Nov 27, 2017 •

edited

Loading

codecov bot commented Nov 27, 2017 •

edited

Loading

codecov bot commented Nov 27, 2017 •

edited

Loading

jreback Nov 29, 2017 •

edited

Loading