-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename categories with Series #17982
Rename categories with Series #17982
Conversation
HEAD: ``` [ 50.00%] ··· Running categoricals.Categoricals3.time_rank_string_cat 6.63ms [ 50.00%] ····· [100.00%] ··· Running categoricals.Categoricals3.time_rank_string_cat_ordered 4.85ms ``` Closes pandas-dev#17981
Codecov Report
@@ Coverage Diff @@
## master #17982 +/- ##
==========================================
- Coverage 91.23% 91.22% -0.02%
==========================================
Files 163 163
Lines 50113 50117 +4
==========================================
- Hits 45723 45718 -5
- Misses 4390 4399 +9
Continue to review full report at Codecov.
|
pandas/core/categorical.py
Outdated
@@ -879,8 +874,22 @@ def rename_categories(self, new_categories, inplace=False): | |||
|
|||
Parameters | |||
---------- | |||
new_categories : Index-like or dict-like (>=0.21.0) | |||
The renamed categories. | |||
new_categories : array-like or dict-like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of a nitpick: new_categories is listed as "array-like" here, but in the description below the term "list-like" is used instead, and then it's switched back to "array-like" further down in the examples section. Might be nice to make the terminology consistent, but certainly not something that's super critical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed.
lgtm. merge away. |
you migh want to add the issue number to the original issue number in the whatsnew (#17336) |
pandas/core/categorical.py
Outdated
Categories (2, object): [A, b] | ||
|
||
Series are considered list-like here, so the *values* are used | ||
instead of the *index* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually want this behaviour?
Eg for Series.rename
, a Series is seen as a dict-like ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. It’ll be a backwards incompatible change if we don’t treat Series as arrays so I think we should at least do this for now, maybe with a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I think this is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback which part to you agree with? Warning that it'll change to dict-like in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no I agree the current behavior is correct. we handle list-like the same. no warning is needed as this is expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, looking at this again.
a Series should be just like a dict. This is a perf issue yes?
do this. (once Index.map
works we could simplify a bit)
In [12]: cat = pd.Categorical(['a', 'b', 'c', 'd'])
...: res = cat.rename_categories(pd.Series({'a': 4, 'b': 3, 'c': 2, 'd': 1}))
...:
...:
In [13]: cat
Out[13]:
[a, b, c, d]
Categories (4, object): [a, b, c, d]
In [14]: res
Out[14]:
[4, 3, 2, 1]
Categories (4, int64): [4, 3, 2, 1]
In [17]: pd.Series(cat.categories).map({'a': 4, 'b': 3, 'c': 2, 'd': 1}).values
Out[17]: array([4, 3, 2, 1])
In [19]: pd.Series(cat.categories).map({'a': 4, 'b': 3, 'c': 2, 'd': 1}).values
Out[19]: array([4, 3, 2, 1])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please revert the warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain?
If we go for "Series -> dict-like" behaviour, this is a breaking change, and we need to use a warning for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
It's not really a perf issue, that's just the reason we caught it. The main issue is the API change from c = pd.Categorical(['a', 'b'])
c.rename(pd.Series([10, 20) Before #17586, the values would be used, and so the categories would be I really don't think we should just break API here. It'll be very surprising and difficult to debug. But in most cases Series is considered dict-like, so I think a warning is the best option here. |
why don’t you just make th change to fix this back to how it was; this PR was in 0.21 so it’s not externally visible warnings are a terrible things and should rarely be used |
Just pushed a WIP adding the warning so that we're talking about the same thing, will clean up a bit further. |
Not sure what you mean here, revert the entire PR? That's a useful feature, it just (unexpectedly) changes the behavior for Series. We didn't have any tests covering that, so it wasn't caught at the time. |
pandas/core/categorical.py
Outdated
"For dict-like, use 'new_categories.to_dict()'\n" | ||
"For list-like, use 'new_categories.values'.") | ||
warn(msg, FutureWarning, stacklevel=2) | ||
if is_dict_like(new_categories) and not is_series: | ||
cat.categories = [new_categories.get(item, item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just use map
pandas/core/categorical.py
Outdated
new_categories : Index-like or dict-like (>=0.21.0) | ||
The renamed categories. | ||
new_categories : list-like or dict-like | ||
The categories end up with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like an incomplete sentence ?
pandas/core/categorical.py
Outdated
categories. | ||
|
||
If dict-like, categories not contained in the mapping are passed | ||
through. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this a bit unstructured with the versionchanged and the further explanation splitted.
Suggestion (just copy paste, not edited sentences):
- list-like: If it is list-like, all items must be unique and the number of
items in the new categories must match the existing number of
categories.
- .. versionadded:: 0.21.1 dict-like, in which case it specifies a mapping from old-categories to new. Categories not contained in the mapping are passed through.
.. warning::
about series
(only not fully sure how the versionchanges works in a list)
pandas/core/categorical.py
Outdated
Categories (2, object): [A, b] | ||
|
||
Series are considered list-like here, so the *values* are used | ||
instead of the *index* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain?
If we go for "Series -> dict-like" behaviour, this is a breaking change, and we need to use a warning for that.
"treat Series like a dictionary.\n" | ||
"For dict-like, use 'new_categories.to_dict()'\n" | ||
"For list-like, use 'new_categories.values'.") | ||
warn(msg, FutureWarning, stacklevel=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe convert the series to array (list-like), so then the rest of the code does not need to take care of it being a series or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go for "Series -> dict-like" behaviour, this is a breaking change, and we need to use a warning for that.
Sorry I think that was an example I added in the first commit of this PR, before we decided to treat Series as list-like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this a breaking change at all? we simply did not support this before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old behavior required a list-like, and Series
are list like. It's not unreasonable for a user to expect
cat.rename(Series([0, 1]))
to work, since it did! But we have a new feature that changes the behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I c, I think this was accidently supported before. ok so fine on the FutureWarning.
I find the current PR fine: warn for series to say that in future it will use dict-like behaviour, but for now keep list-like behaviour |
@jreback are you OK with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from some typos, looks good to me
doc/source/whatsnew/v0.21.0.txt
Outdated
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
:meth:`~Series.cat.rename_categories` now accepts a dict-like argument for | ||
``new_categories``. The previous categories are lookup up in the dictionary's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"lookup up" -> ""looked up"? (not sure what is the correct english conjugation)
pandas/core/categorical.py
Outdated
* dict-like: specifies a mapping from | ||
old categories to new. Categories not contained in the mapping | ||
are passed through and extra categories in the mapping are | ||
ignored. *New in verison 0.21.0*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
verison -> version
"treat Series like a dictionary.\n" | ||
"For dict-like, use 'new_categories.to_dict()'\n" | ||
"For list-like, use 'new_categories.values'.") | ||
warn(msg, FutureWarning, stacklevel=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I c, I think this was accidently supported before. ok so fine on the FutureWarning.
* PERF/API: Treat series as array-like for rename_categories HEAD: ``` [ 50.00%] ··· Running categoricals.Categoricals3.time_rank_string_cat 6.63ms [ 50.00%] ····· [100.00%] ··· Running categoricals.Categoricals3.time_rank_string_cat_ordered 4.85ms ``` Closes pandas-dev#17981 * Redo docstring * Use list-like * Warn * Fix doc indent * Doc cleanup * More doc cleanup * Fix API reference * Typos
* PERF/API: Treat series as array-like for rename_categories HEAD: ``` [ 50.00%] ··· Running categoricals.Categoricals3.time_rank_string_cat 6.63ms [ 50.00%] ····· [100.00%] ··· Running categoricals.Categoricals3.time_rank_string_cat_ordered 4.85ms ``` Closes pandas-dev#17981 * Redo docstring * Use list-like * Warn * Fix doc indent * Doc cleanup * More doc cleanup * Fix API reference * Typos
Closes #17981
head:
master: