-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG/DEPR: deprecate Categorical take default behaviour + fix Series[categorical].take #20841
BUG/DEPR: deprecate Categorical take default behaviour + fix Series[categorical].take #20841
Conversation
commit ec0cecd Author: Tom Augspurger <[email protected]> Date: Fri Apr 27 06:02:48 2018 -0500 Updates commit 6858409 Author: Tom Augspurger <[email protected]> Date: Fri Apr 27 05:48:59 2018 -0500 Added note commit eb43fa4 Author: Tom Augspurger <[email protected]> Date: Thu Apr 26 20:47:35 2018 -0500 Really truly fix it hopefully. commit 7c4f625 Merge: 9a6c7d4 6cacdde Author: Tom Augspurger <[email protected]> Date: Thu Apr 26 20:40:15 2018 -0500 Merge remote-tracking branch 'upstream/master' into ea-take commit 9a6c7d4 Author: Tom Augspurger <[email protected]> Date: Thu Apr 26 20:04:17 2018 -0500 Doc updates commit eecd632 Author: Tom Augspurger <[email protected]> Date: Thu Apr 26 15:00:00 2018 -0500 Skip categorical take tests commit f3b91ca Author: Tom Augspurger <[email protected]> Date: Thu Apr 26 13:43:26 2018 -0500 doc fixup commit fbc4425 Author: Tom Augspurger <[email protected]> Date: Thu Apr 26 13:37:45 2018 -0500 Updates * indexer -> indices * doc user-facing vs physical * assert na_cmps * test reindex w/ non-NA fill_value commit 741f284 Merge: 5db6624 630ef16 Author: Tom Augspurger <[email protected]> Date: Thu Apr 26 07:18:32 2018 -0500 Merge remote-tracking branch 'upstream/master' into ea-take commit 5db6624 Author: Tom Augspurger <[email protected]> Date: Thu Apr 26 07:17:30 2018 -0500 Doc and move tests commit 74b2c09 Author: Tom Augspurger <[email protected]> Date: Wed Apr 25 21:40:18 2018 -0500 Added verisonadded commit fc729d6 Author: Tom Augspurger <[email protected]> Date: Wed Apr 25 15:51:27 2018 -0500 Fixed editor commit 1a4d987 Author: Tom Augspurger <[email protected]> Date: Wed Apr 25 15:50:48 2018 -0500 Pass an array commit bbcbf19 Author: Tom Augspurger <[email protected]> Date: Wed Apr 25 15:07:28 2018 -0500 Cleanup commit d5470a0 Author: Tom Augspurger <[email protected]> Date: Wed Apr 25 15:02:26 2018 -0500 Fixed reorder commit 82cad8b Author: Tom Augspurger <[email protected]> Date: Wed Apr 25 15:00:43 2018 -0500 Stale comment commit c449afd Author: Tom Augspurger <[email protected]> Date: Wed Apr 25 14:48:33 2018 -0500 Bounds checking commit 449983b Author: Tom Augspurger <[email protected]> Date: Wed Apr 25 12:55:31 2018 -0500 Linting commit 69e7fe7 Author: Tom Augspurger <[email protected]> Date: Wed Apr 25 12:40:20 2018 -0500 Updates 1. Reversed order of take keywords 2. Added to extensions API 3. Removed default implementation commit 05d8844 Author: Tom Augspurger <[email protected]> Date: Wed Apr 25 09:59:33 2018 -0500 Updated docs commit 31cd304 Author: Tom Augspurger <[email protected]> Date: Wed Apr 25 09:43:45 2018 -0500 pep8 commit 338566f Author: Tom Augspurger <[email protected]> Date: Wed Apr 25 09:42:28 2018 -0500 Upcasting commit b7ae0bc Author: Tom Augspurger <[email protected]> Date: Wed Apr 25 09:06:59 2018 -0500 revert combine change commit 125ca0b Author: Tom Augspurger <[email protected]> Date: Wed Apr 25 08:37:07 2018 -0500 Simplify Upcasting is still broken commit c721915 Author: Tom Augspurger <[email protected]> Date: Wed Apr 25 07:50:54 2018 -0500 Removed default_fill_value commit 37915e9 Merge: 67ba9dd 60fe82c Author: Tom Augspurger <[email protected]> Date: Wed Apr 25 07:44:15 2018 -0500 Merge remote-tracking branch 'upstream/master' into ea-take commit 67ba9dd Author: Tom Augspurger <[email protected]> Date: Wed Apr 25 07:42:54 2018 -0500 more with default fill value commit eba137f Author: Tom Augspurger <[email protected]> Date: Wed Apr 25 05:59:58 2018 -0500 More internals hacking commit 08f2479 Author: Tom Augspurger <[email protected]> Date: Wed Apr 25 05:59:17 2018 -0500 Fixup JSON take commit 0be9ec6 Author: Tom Augspurger <[email protected]> Date: Tue Apr 24 18:02:13 2018 -0500 non-internals changes commit dacd98e Author: Tom Augspurger <[email protected]> Date: Tue Apr 24 14:45:36 2018 -0500 Moves commit fb3c234 Author: Tom Augspurger <[email protected]> Date: Tue Apr 24 13:59:51 2018 -0500 [WIP]: ExtensionArray.take default implementation Implements a take interface that's compatible with NumPy and optionally pandas' NA semantics. ```python In [1]: import pandas as pd In [2]: from pandas.tests.extension.decimal.array import * In [3]: arr = DecimalArray(['1.1', '1.2', '1.3']) In [4]: arr.take([0, 1, -1]) Out[4]: DecimalArray(array(['1.1', '1.2', '1.3'], dtype=object)) In [5]: arr.take([0, 1, -1], fill_value=float('nan')) Out[5]: DecimalArray(array(['1.1', '1.2', Decimal('NaN')], dtype=object)) ``` Closes pandas-dev#20640
We're changing how Categorical.take handles negative indices to be in line with Series and other EAs.
Codecov Report
@@ Coverage Diff @@
## master #20841 +/- ##
==========================================
+ Coverage 91.77% 91.78% +<.01%
==========================================
Files 153 153
Lines 49313 49324 +11
==========================================
+ Hits 45259 45272 +13
+ Misses 4054 4052 -2
Continue to review full report at Codecov.
|
OK, this is clean on top of master now. |
Didn't see any warnings in the test logs. Should be good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Some minor comments. Maybe you can add a whatsnew notice in the bug fixes as well for the Series.take one.
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -856,6 +856,7 @@ Other API Changes | |||
- Constructing a Series from a list of length 1 no longer broadcasts this list when a longer index is specified (:issue:`19714`, :issue:`20391`). | |||
- :func:`DataFrame.to_dict` with ``orient='index'`` no longer casts int columns to float for a DataFrame with only int and float columns (:issue:`18580`) | |||
- A user-defined-function that is passed to :func:`Series.rolling().aggregate() <pandas.core.window.Rolling.aggregate>`, :func:`DataFrame.rolling().aggregate() <pandas.core.window.Rolling.aggregate>`, or its expanding cousins, will now *always* be passed a ``Series``, rather than a ``np.array``; ``.apply()`` only has the ``raw`` keyword, see :ref:`here <whatsnew_0230.enhancements.window_raw>`. This is consistent with the signatures of ``.aggregate()`` across pandas (:issue:`20584`) | |||
- The previous default behavior of negative indices in ``Categorical.take`` is deprecated. In a future version it will change from meaning missing values to meaning positional indicies from the right (:issue:`20664`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indicies -> indices
I would also add "to be consistent with Series.take
" at the end
pandas/core/arrays/categorical.py
Outdated
""" Take the codes by the indexer, fill with the fill_value. | ||
def take_nd(self, indexer, allow_fill=None, fill_value=None): | ||
""" | ||
Return the indices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> Take elements from the Categorical
? (in any case, it are not the indices that are returned :-))
pandas/core/arrays/categorical.py
Outdated
assert isna(fill_value) | ||
if fill_value is None or isna(fill_value): | ||
# For backwards compatability, we have to override | ||
# any na values for `fill_value` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to explain why we have the isna(fill_value)
. In the past we checked for fill_value
being null, and then ignoring it and using -1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But np.nan
is the NA value for Categorical, so shouldn't its take
method be able to check that anyhow (regardless from back compat reasons) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I understand. Will fix
@@ -753,6 +753,16 @@ def test_take(): | |||
s.take([-1, 3, 4], convert=False) | |||
|
|||
|
|||
def test_take_categorical(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of this test (or in addition to), I think you should be able to remove the skip from test_take_series
in the categorical extension tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That still fails because of #20747. The unobserved categories are still present. I've updated the skip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, that's still there :)
Was that ever in a released version? I thought it was just master where we broke that. |
No, it was a regression from 0.20 -> 0.21/22 |
Last failure was random. Fixed (properly this time) by d1c7e38. Merging on green. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since u didn’t change any tests are there no warnings anywhere??
@@ -5,6 +5,18 @@ | |||
import pandas.util.testing as tm | |||
|
|||
|
|||
@pytest.fixture(params=[True, False]) | |||
def allow_fill(request): | |||
"""Boolean 'allow_fill' parameter for Categorical.take""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these can prob be moved into a higher level conftest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do when that's necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls make a conftest here, otherwise these are easily lost
# https://github.com/pandas-dev/pandas/issues/20664 | ||
# TODO: remove when the default Categorical.take behavior changes | ||
kwargs = {'allow_fill': False} | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you changed the default? it was True before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that take=False
wasn't passed through here was a regression from 0.20. The default for Categorical.take is still True (None -> warning -> True).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though since allow_fill isn't a keyword for _take
we can just pass it unconditionally here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nevermind. allow_fill isn't a keyword for ndarray.take. That's why the kwargs is needed.
We apparently didn't test take directly. |
@@ -1024,6 +1025,7 @@ Categorical | |||
- Bug in ``Categorical.__iter__`` not converting to Python types (:issue:`19909`) | |||
- Bug in :func:`pandas.factorize` returning the unique codes for the ``uniques``. This now returns a ``Categorical`` with the same dtype as the input (:issue:`19721`) | |||
- Bug in :func:`pandas.factorize` including an item for missing values in the ``uniques`` return value (:issue:`19721`) | |||
- Bug in :meth:`Series.take` with categorical data interpreting ``-1`` in `indices` as missing value markers, rather than the last element of the Series (:issue:`20664`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double backticks on indices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single backticks for parameter names.
indexer = np.asarray(indexer, dtype=np.intp) | ||
if allow_fill is None: | ||
if (indexer < 0).any(): | ||
warn(_take_msg, FutureWarning, stacklevel=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just put the warning inline here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That messes with the formatting.
allow_fill = True | ||
|
||
if isna(fill_value): | ||
# For categorical, any NA value is considered a user-facing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this condition tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
@@ -5,6 +5,18 @@ | |||
import pandas.util.testing as tm | |||
|
|||
|
|||
@pytest.fixture(params=[True, False]) | |||
def allow_fill(request): | |||
"""Boolean 'allow_fill' parameter for Categorical.take""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls make a conftest here, otherwise these are easily lost
…andas into categorical-take
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor doc comment, otherwise lgtm.
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -856,6 +856,7 @@ Other API Changes | |||
- Constructing a Series from a list of length 1 no longer broadcasts this list when a longer index is specified (:issue:`19714`, :issue:`20391`). | |||
- :func:`DataFrame.to_dict` with ``orient='index'`` no longer casts int columns to float for a DataFrame with only int and float columns (:issue:`18580`) | |||
- A user-defined-function that is passed to :func:`Series.rolling().aggregate() <pandas.core.window.Rolling.aggregate>`, :func:`DataFrame.rolling().aggregate() <pandas.core.window.Rolling.aggregate>`, or its expanding cousins, will now *always* be passed a ``Series``, rather than a ``np.array``; ``.apply()`` only has the ``raw`` keyword, see :ref:`here <whatsnew_0230.enhancements.window_raw>`. This is consistent with the signatures of ``.aggregate()`` across pandas (:issue:`20584`) | |||
- The previous default behavior of negative indices in ``Categorical.take`` is deprecated. In a future version it will change from meaning missing values to meaning positional indices from the right. The future behavior is consistent with :meth:`Series.take` (:issue:`20664`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be in the deprecations section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
This was passing before, so merging. |
closes #20664
I might be unavailable for the next few hours, so I'm just putting this up here, even though it includes changes from #20814 in the first commit. Once that is merged, this can be merged on top of master and it'll just have the categorical changes.