-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix groupby over a CategoricalIndex in axis=1 #18525
BUG: Fix groupby over a CategoricalIndex in axis=1 #18525
Conversation
pandas/core/groupby.py
Outdated
@@ -2859,7 +2859,7 @@ def is_in_obj(gpr): | |||
else: | |||
in_axis, name = False, None | |||
|
|||
if is_categorical_dtype(gpr) and len(gpr) != len(obj): | |||
if is_categorical_dtype(gpr) and len(gpr) != obj.shape[axis]: | |||
raise ValueError("Categorical dtype grouper must " | |||
"have len(grouper) == len(data)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In light of the change in the check, perhaps this error message should change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll change it to "Grouper and axis must be same length".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put in the actual len(grouper) and len(axis) as well
@ekisslinger : Don't forget to add a |
Any chance this fix can be added to 0.21.1? |
Codecov Report
@@ Coverage Diff @@
## master #18525 +/- ##
==========================================
+ Coverage 91.32% 91.33% +<.01%
==========================================
Files 163 164 +1
Lines 49798 49819 +21
==========================================
+ Hits 45479 45503 +24
+ Misses 4319 4316 -3
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18525 +/- ##
==========================================
+ Coverage 91.32% 91.33% +<.01%
==========================================
Files 163 164 +1
Lines 49798 49819 +21
==========================================
+ Hits 45479 45503 +24
+ Misses 4319 4316 -3
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18525 +/- ##
==========================================
- Coverage 91.35% 91.33% -0.02%
==========================================
Files 164 164
Lines 49802 49802
==========================================
- Hits 45496 45487 -9
- Misses 4306 4315 +9
Continue to review full report at Codecov.
|
@jreback @jorisvandenbossche : Thoughts? |
will look in a bit |
this would be ok for 0.21.1, pls add a whatsnew note |
pandas/core/groupby.py
Outdated
@@ -2859,7 +2859,7 @@ def is_in_obj(gpr): | |||
else: | |||
in_axis, name = False, None | |||
|
|||
if is_categorical_dtype(gpr) and len(gpr) != len(obj): | |||
if is_categorical_dtype(gpr) and len(gpr) != obj.shape[axis]: | |||
raise ValueError("Categorical dtype grouper must " | |||
"have len(grouper) == len(data)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put in the actual len(grouper) and len(axis) as well
@@ -191,6 +191,25 @@ def test_groupby_categorical_index(self): | |||
[0, 1, 2, 3], levels, ordered=True), name='cats') | |||
assert_frame_equal(result, expected) | |||
|
|||
def test_groupby_categorical_columns_index(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to pandas/tests/groupby/test_grouper.py, which is where things like this are tested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can't find that file...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, test_grouping.py (move both this version and the transposed)
also add the transposed version of this in your test (which works now) |
I think
|
2a11e78
to
a05147f
Compare
I updated the bug fix branch. The changes are:
|
doc/source/whatsnew/v0.21.1.txt
Outdated
@@ -137,6 +137,7 @@ Categorical | |||
- Error messages in the testing module have been improved when items have different ``CategoricalDtype`` (:issue:`18069`) | |||
- ``CategoricalIndex`` can now correctly take a ``pd.api.types.CategoricalDtype`` as its dtype (:issue:`18116`) | |||
- Bug in ``Categorical.unique()`` returning read-only ``codes`` array when all categories were ``NaN`` (:issue:`18051`) | |||
- Bug when grouping over a ``CategoricalIndex`` in axis=1 (:issue:`18432`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataFrame.groupby(axis=1)
with a CategoricalIndex
pandas/core/groupby.py
Outdated
"have len(grouper) == len(data)") | ||
if is_categorical_dtype(gpr) and len(gpr) != obj.shape[axis]: | ||
raise ValueError( | ||
("Length of grouper ({0}) and axis ({1}) must be same length" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use kwargs rather than positional formatters
@@ -191,6 +191,25 @@ def test_groupby_categorical_index(self): | |||
[0, 1, 2, 3], levels, ordered=True), name='cats') | |||
assert_frame_equal(result, expected) | |||
|
|||
def test_groupby_categorical_columns_index(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, test_grouping.py (move both this version and the transposed)
@@ -191,6 +191,45 @@ def test_groupby_categorical_index(self): | |||
[0, 1, 2, 3], levels, ordered=True), name='cats') | |||
assert_frame_equal(result, expected) | |||
|
|||
def test_groupby_categorical_columns_index(self): | |||
# GH18432 | |||
s = np.random.RandomState(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessary to set a random key here
a05147f
to
d7640e8
Compare
i made the requested changes |
columns = ['A', 'B', 'A', 'B'] | ||
categories = ['B', 'A'] | ||
cat_columns = CategoricalIndex(columns, | ||
categories=categories, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you change this test to instead of using random data, uses integer as the data, then directly construct the result frame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I made the change. I think it is what you had in mind and I agree it is more straightforward this way.
closes GH18432 Add multi-index columns test to test_groupby_categorical_columns_index() Add whatsnew for GH18432 bug fix Fix ValueError text for GH18432 bug fix Update whatsnew text Use kwargs instead of positional format params Move test_groupby_categorical_columns_index() to pandas/tests/groupby/test_grouping.py Directly construct expected dataframe in test_groupby_categorical_index_and_columns()
d7640e8
to
19f6041
Compare
In case you didn't see the last message, as it is a bit hidden, I made the last change which was to use integer instead of random data and directly construct the dataframe. |
thanks @ekisslinger |
(cherry picked from commit 5da3759)
(cherry picked from commit 5da3759)
git diff upstream/master -u -- "*.py" | flake8 --diff