-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename categories with Series #17982
Changes from 3 commits
b551b28
19af62a
f659e78
6460bc5
506f9ba
4f6a1dd
cbcc723
aec380d
0e086ca
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -866,11 +866,6 @@ def set_categories(self, new_categories, ordered=None, rename=False, | |
def rename_categories(self, new_categories, inplace=False): | ||
""" Renames categories. | ||
|
||
The new categories can be either a list-like dict-like object. | ||
If it is list-like, all items must be unique and the number of items | ||
in the new categories must be the same as the number of items in the | ||
old categories. | ||
|
||
Raises | ||
------ | ||
ValueError | ||
|
@@ -879,8 +874,22 @@ def rename_categories(self, new_categories, inplace=False): | |
|
||
Parameters | ||
---------- | ||
new_categories : Index-like or dict-like (>=0.21.0) | ||
The renamed categories. | ||
new_categories : list-like or dict-like | ||
The categories end up with | ||
|
||
.. versionchanged:: 0.21.0 | ||
|
||
new_categories may now also be dict-like, in which case it | ||
specifies a mapping from old-categories to new. | ||
|
||
If it is list-like, all items must be unique and the number of | ||
items in the new categories must match the existing number of | ||
categories. | ||
|
||
If dict-like, categories not contained in the mapping are passed | ||
through. Note that ``Series`` are considered list-like in this | ||
context. | ||
|
||
inplace : boolean (default: False) | ||
Whether or not to rename the categories inplace or return a copy of | ||
this categorical with renamed categories. | ||
|
@@ -896,11 +905,33 @@ def rename_categories(self, new_categories, inplace=False): | |
remove_categories | ||
remove_unused_categories | ||
set_categories | ||
|
||
Examples | ||
-------- | ||
>>> c = Categorical(['a', 'a', 'b']) | ||
>>> c.rename_categories([0, 1]) | ||
[0, 0, 1] | ||
Categories (2, int64): [0, 1] | ||
|
||
For dict-like ``new_categories``, extra keys are ignored and | ||
categories not in the dictionary are passed through | ||
|
||
>>> c.rename_categories({'a': 'A', 'c': 'C'}) | ||
[A, A, b] | ||
Categories (2, object): [A, b] | ||
|
||
Series are considered list-like here, so the *values* are used | ||
instead of the *index* | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we actually want this behaviour? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure. It’ll be a backwards incompatible change if we don’t treat Series as arrays so I think we should at least do this for now, maybe with a warning. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree, I think this is fine. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback which part to you agree with? Warning that it'll change to dict-like in the future? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no I agree the current behavior is correct. we handle list-like the same. no warning is needed as this is expected. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. actually, looking at this again. a Series should be just like a dict. This is a perf issue yes?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please revert the warning There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you explain? |
||
|
||
>>> c.rename_categories(pd.Series([0, 1], index=['a', 'b'])) | ||
[0, 0, 1] | ||
Categories (2, int64): [0, 1] | ||
""" | ||
inplace = validate_bool_kwarg(inplace, 'inplace') | ||
cat = self if inplace else self.copy() | ||
|
||
if is_dict_like(new_categories): | ||
if (is_dict_like(new_categories) and | ||
not isinstance(new_categories, ABCSeries)): | ||
cat.categories = [new_categories.get(item, item) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just use map |
||
for item in cat.categories] | ||
else: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like an incomplete sentence ?