-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API/DEPR: Change default skipna behaviour + deprecate numeric_only in Categorical.min and max #27929
Conversation
@@ -38,29 +38,38 @@ def test_min_max(self): | |||
cat = Categorical( | |||
[np.nan, "b", "c", np.nan], categories=["d", "c", "b", "a"], ordered=True | |||
) | |||
_min = cat.min() | |||
_max = cat.max() | |||
_min = cat.min(skipna=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you parameterize this test on skipna=True/False
@jreback Anything else? Please tell me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions
- Did the default change?
numeric_only=None
seems to be functionally equivalent toskipna=False
. - What's the reason for the change of the implementation? It's not clear to me if this is going to have a performance impact.
@@ -1028,7 +1028,7 @@ def test_min_max(self): | |||
) | |||
_min = cat.min() | |||
_max = cat.max() | |||
assert np.isnan(_min) | |||
assert _min == "c" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why idd this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to the following comment.
The default behavior will change when pandas/pandas/core/arrays/categorical.py Lines 2236 to 2240 in ea60c19
The
In this PR, the default behavior of |
OK, that behavior looks pretty buggy. But I'm not sure if we should be just changing the default output of Given that users will need to update their code anyway to use the new argument, I think that we should try to get the correct behavior when I also think the error message can be improved. In [6]: c.max(numeric_only=False)
/Users/taugspurger/.virtualenvs/pandas-dev/bin/ipython:1: FutureWarning: the 'numeric_only' keyword is deprecated, use 'skipna' instead
#!/Users/taugspurger/Envs/pandas-dev/bin/python
Out[6]: nan Reading that, it seems like I just need to replace |
@jorisvandenbossche Would you tell us your thought about Tom's suggestion? That is keeping the buggy behaviour when |
@makbigc can you merge master. @jorisvandenbossche can you respond to questions here: #27929 (comment) |
Focusing on the default behaviour for a moment, so when no arguments are specified (and not how to handle the numeric_only keyword): what default behaviour do we want? Currently,
To be consistent with the rest of pandas, this result should be 1 instead of nan (since we have a default I think we agree that we want that correct behaviour long term? Question is then how to get there:
Personally, I might have a slight preference to actually do a breaking change on this for 1.0 for the default behaviour (we still need to deprecate the numeric_only keyword when specified, that's a separate thing). But, it's certainly possible to do it with a deprecation (we will only need to change |
@jorisvandenbossche Thanks for your detail reply. If I take the first approach (i.e., breaking change), what I should do is:
Anything else? Please tell me. |
Let's wait a bit to see what others think about the default behaviour. |
I agree we should deprecate Though a breaking change is simpler. |
@TomAugspurger thoughts on #27929 (comment) ? |
That seems good.
… On Oct 20, 2019, at 07:35, Joris Van den Bossche ***@***.***> wrote:
@TomAugspurger thoughts on #27929 (comment) ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Sorry, in the linked comment I ask several questions. So what is the "that seems good" exactly answering to? |
Your summary at the end. Breaking change for default + a deprecation saying that
|
can you merge master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, now that we have agreement on the way forward (breaking change for default behaviour + deprecate numeric_only), can you
- Add a section to the whatsnew for 1.0.0 in the API breaking changes section about this?
- Update to already use
skipna=True
as the new default? (I think there is then no need to first have skipna as None as default and raise a warning for that?)
pandas/core/arrays/categorical.py
Outdated
@@ -2193,7 +2193,8 @@ def _reduce(self, name, axis=0, **kwargs): | |||
raise TypeError(msg.format(op=name)) | |||
return func(**kwargs) | |||
|
|||
def min(self, numeric_only=None, **kwargs): | |||
@deprecate_kwarg(old_arg_name="numeric_only", new_arg_name="skipna") | |||
def min(self, skipna=None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be skipna=True ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update!
A few more comments, should be almost good now
doc/source/whatsnew/v1.0.0.rst
Outdated
By default :meth:`Categorical.min` and :meth:`Categorical.max` return the min and the max respectively | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
When :class:`Categorical` contains ``np.nan``, :meth:`Categorical.min` and :meth:`Categorical.max` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When :class:`Categorical` contains ``np.nan``, :meth:`Categorical.min` and :meth:`Categorical.max` | |
When :class:`Categorical` contains ``np.nan``, :meth:`Categorical.min` |
It was only min
that returned NaN (can you update the title as well?)
doc/source/whatsnew/v1.0.0.rst
Outdated
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
When :class:`Categorical` contains ``np.nan``, :meth:`Categorical.min` and :meth:`Categorical.max` | ||
no longer return ``np.nan`` by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add something like "to honor the default of skipna=True
" to make it clear that this change makes it consistent with the rest of pandas
pandas/core/arrays/categorical.py
Outdated
"The default value of skipna will be changed to " | ||
"True in the future version." | ||
) | ||
warn(msg, FutureWarning, stacklevel=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this if skipna is None
block with the warning I think?
pandas/core/arrays/categorical.py
Outdated
if skipna: | ||
pointer = self._codes[good].max(**kwargs) | ||
else: | ||
if skipna is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
[np.nan, 1, 2, np.nan], categories=[5, 4, 3, 2, 1], ordered=True | ||
) | ||
with tm.assert_produces_warning( | ||
expected_warning=FutureWarning, check_stacklevel=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work without check_stacklevel=False
?
_max = cat.max(numeric_only=True) | ||
assert _max == "b" | ||
if skipna is False: | ||
assert np.isnan(_min) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use isna/notna
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.isnan
is a more strict / correct test in this case, since we are actually returning NaN (and not None, NA or NaT)
if skipna: | ||
pointer = self._codes[good].min(**kwargs) | ||
else: | ||
return np.nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not correct for i8 types, which should be pd.NaT. how to fix this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could check the categories.dtype.na_value
if it exists. But since this is the current behaviour, it's not critical to fix in this PR I think.
It is strange that the failed tests don't call |
@makbigc I merged that other PR. So if you merge latest master in this branch, the error should be solved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one remaining comment about removing the kwargs (and added two wording suggestions you can commit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks all good now, thanks @makbigc for keeping on! (as this took some time ..)
…ndexing-1row-df * upstream/master: (49 commits) repr() (pandas-dev#29959) DOC : Typo fix in userguide/Styling (pandas-dev#29956) CLN: small things in pytables (pandas-dev#29958) API/DEPR: Change default skipna behaviour + deprecate numeric_only in Categorical.min and max (pandas-dev#27929) DEPR: DTI/TDI/PI constructor arguments (pandas-dev#29930) CLN: fix pytables passing too many kwargs (pandas-dev#29951) Typing (pandas-dev#29947) repr() (pandas-dev#29948) repr() (pandas-dev#29950) Added space at the end of the sentence (pandas-dev#29949) ENH: add NA scalar for missing value indicator, use in StringArray. (pandas-dev#29597) CLN: BlockManager.apply (pandas-dev#29825) TST: add test for rolling max/min/mean with DatetimeIndex over different frequencies (pandas-dev#29932) CLN: explicit signature for to_hdf (pandas-dev#29939) CLN: make kwargs explicit for pytables read_ methods (pandas-dev#29935) Convert core/indexes/base.py to f-strings (pandas-dev#29903) DEPR: dropna multiple axes, fillna int for td64, from_codes with floats, Series.nonzero (pandas-dev#29875) CLN: make kwargs explicit in pytables constructors (pandas-dev#29936) DEPR: tz_convert in the Timestamp constructor raises (pandas-dev#29929) STY: F-strings and repr (pandas-dev#29938) ...
… Categorical.min and max (pandas-dev#27929)
… Categorical.min and max (pandas-dev#27929)
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff