Skip to content

Commit

Permalink
BUG: fixes #53935 Categorical order lost after call to remove_categor…
Browse files Browse the repository at this point in the history
…ies (#54027)

* Changed the default value for sort to 'False' in the difference method. This allows the difference method to then call the _difference method and finally call the _maybe_try_sort method.  In the _maybe_try_sort_method it will sort the values if sort is not False.  That's why in the original code haveing sort=None would still sort the categories.  This way the code will only sort if you set sort=True.

* Added test to show the variable value change behaves the way we want it to.

* Added bug fix to whatsnew.

* Changed bug fix implementation to simply check if the Ordered value is set to True, if so it sets sort=False in the call to difference in remove_categories.

* Changed bug fix implementation to simply check if the Ordered value is set to True, if so it sets sort=False in the call to difference in remove_categories.

* Switched the implementation to a ternary to check for ordered. This seems to work better since we are not overriding a default argument this way.
  • Loading branch information
paulreece authored Jul 11, 2023
1 parent 9d1d1b1 commit 41dfccc
Show file tree
Hide file tree
Showing 3 changed files with 21 additions and 1 deletion.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -376,6 +376,7 @@ Bug fixes

Categorical
^^^^^^^^^^^
- Bug in :meth:`CategoricalIndex.remove_categories` where ordered categories would not be maintained (:issue:`53935`).
- Bug in :meth:`Series.astype` with ``dtype="category"`` for nullable arrays with read-only null value masks (:issue:`53658`)
- Bug in :meth:`Series.map` , where the value of the ``na_action`` parameter was not used if the series held a :class:`Categorical` (:issue:`22527`).
-
Expand Down
6 changes: 5 additions & 1 deletion pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -1369,7 +1369,11 @@ def remove_categories(self, removals) -> Self:
removals = [removals]

removals = Index(removals).unique().dropna()
new_categories = self.dtype.categories.difference(removals)
new_categories = (
self.dtype.categories.difference(removals, sort=False)
if self.dtype.ordered is True
else self.dtype.categories.difference(removals)
)
not_included = removals.difference(self.dtype.categories)

if len(not_included) != 0:
Expand Down
15 changes: 15 additions & 0 deletions pandas/tests/indexes/categorical/test_category.py
Original file line number Diff line number Diff line change
Expand Up @@ -373,3 +373,18 @@ def test_method_delegation(self):
msg = "cannot use inplace with CategoricalIndex"
with pytest.raises(ValueError, match=msg):
ci.set_categories(list("cab"), inplace=True)

def test_remove_maintains_order(self):
ci = CategoricalIndex(list("abcdda"), categories=list("abcd"))
result = ci.reorder_categories(["d", "c", "b", "a"], ordered=True)
tm.assert_index_equal(
result,
CategoricalIndex(list("abcdda"), categories=list("dcba"), ordered=True),
)
result = result.remove_categories(["c"])
tm.assert_index_equal(
result,
CategoricalIndex(
["a", "b", np.nan, "d", "d", "a"], categories=list("dba"), ordered=True
),
)

0 comments on commit 41dfccc

Please sign in to comment.