-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Allow ordered=None in CategoricalDtype #18889
Conversation
@@ -361,11 +359,16 @@ def _update_dtype(self, dtype): | |||
'got {dtype!r}').format(dtype=dtype) | |||
raise ValueError(msg) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate, I don't think we need to have private methods on CDT. e.g. validate_ordered
and update_dtype
(maybe more). As this is used in other contexts. so a de-privatize PR should be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually can you change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done for _update_dtype
, _validate_ordered
, and _validate_categories
doc/source/whatsnew/v0.22.0.txt
Outdated
@@ -198,6 +198,7 @@ Other API Changes | |||
- Rearranged the order of keyword arguments in :func:`read_excel()` to align with :func:`read_csv()` (:issue:`16672`) | |||
- :func:`pandas.merge` now raises a ``ValueError`` when trying to merge on incompatible data types (:issue:`9780`) | |||
- :func:`wide_to_long` previously kept numeric-like suffixes as ``object`` dtype. Now they are cast to numeric if possible (:issue:`17627`) | |||
- The default value of the ``ordered`` parameter for :class:`~pandas.api.types.CategoricalDtype` has changed from ``False`` to ``None``. Behavior should remain consistent for downstream objects, such as :class:`Categorical` (:issue:`18790`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebase on master. can you move to 0.23 (docs were renamed), prob easiest to just check this file from master and past in new one
a981daf
to
289defa
Compare
Codecov Report
@@ Coverage Diff @@
## master #18889 +/- ##
==========================================
+ Coverage 91.62% 91.62% +<.01%
==========================================
Files 150 150
Lines 48798 48798
==========================================
+ Hits 44709 44711 +2
+ Misses 4089 4087 -2
Continue to review full report at Codecov.
|
e312d24
to
cf6107e
Compare
return other.ordered and self.categories.equals(other.categories) | ||
elif other.ordered: | ||
return False | ||
elif self.ordered or other.ordered: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can ordered be None here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, e.g. CDT(list('abcd'), None) == CDT(list('dcba'), None)
, but None
is fine here since it is Falsey, and this is just meant to catch cases where at least one is True
@@ -361,11 +359,16 @@ def _update_dtype(self, dtype): | |||
'got {dtype!r}').format(dtype=dtype) | |||
raise ValueError(msg) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually can you change this?
pandas/tests/dtypes/test_dtypes.py
Outdated
c1 = CategoricalDtype(list('abc'), ordered1) | ||
c2 = CategoricalDtype(list('abc'), ordered2) | ||
result = c1 == c2 | ||
expected = (ordered1 is ordered2) or not any([ordered1, ordered2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
give a comment here about what is expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a comment a few lines up, and simplified the logic for expected
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -208,6 +208,7 @@ Other API Changes | |||
- In :func:`read_excel`, the ``comment`` argument is now exposed as a named parameter (:issue:`18735`) | |||
- Rearranged the order of keyword arguments in :func:`read_excel()` to align with :func:`read_csv()` (:issue:`16672`) | |||
- The options ``html.border`` and ``mode.use_inf_as_null`` were deprecated in prior versions, these will now show ``FutureWarning`` rather than a ``DeprecationWarning`` (:issue:`19003`) | |||
- The default value of the ``ordered`` parameter for :class:`~pandas.api.types.CategoricalDtype` has changed from ``False`` to ``None``. Behavior should remain consistent for downstream objects, such as :class:`Categorical` (:issue:`18790`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does your other PR mention why we made this change (updating just categories
and not changing ordered)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so; updated the whatsnew entry to specify this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments inline.
cf6107e
to
928ce5e
Compare
@jreback @TomAugspurger : Sorry, forgot about this! Made the requested changes. |
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -438,6 +438,7 @@ Other API Changes | |||
- Set operations (union, difference...) on :class:`IntervalIndex` with incompatible index types will now raise a ``TypeError`` rather than a ``ValueError`` (:issue:`19329`) | |||
- :class:`DateOffset` objects render more simply, e.g. "<DateOffset: days=1>" instead of "<DateOffset: kwds={'days': 1}>" (:issue:`19403`) | |||
- :func:`pandas.merge` provides a more informative error message when trying to merge on timezone-aware and timezone-naive columns (:issue:`15800`) | |||
- The default value of the ``ordered`` parameter for :class:`~pandas.api.types.CategoricalDtype` has changed from ``False`` to ``None`` to allow updating of ``categories`` without impacting ``ordered``. Behavior should remain consistent for downstream objects, such as :class:`Categorical` (:issue:`18790`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this a subsection
pandas/core/dtypes/dtypes.py
Outdated
elif other.ordered: | ||
return False | ||
elif self.ordered or other.ordered: | ||
# at least one ordered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expand on the comment here (kind of like you did in the tests below), to enumerate the cases for ordered == ordered (and when they match / don't match)
928ce5e
to
7e18be0
Compare
7e18be0
to
007340d
Compare
thanks @jschendel as always very nice patches! keep coming! IF you are interested in working on some of the new interval indexing things (the _new test files), would be great. |
git diff upstream/master -u -- "*.py" | flake8 --diff
For equality comparisons with
ordered=None
, I essentially treated it as if it whereordered=False
:CDT(['a', 'b'], None) == CDT(['a', 'b'], False)
-->True
CDT(['a', 'b'], None) == CDT(['b', 'a'], False)
-->True
CDT(['a', 'b'], None) == CDT(['a', 'b'], True)
-->False
This maintains existing comparison behavior when ordered is not specified:
CDT(['a', 'b'], False) == CDT(['a', 'b'])
-->True
CDT(['a', 'b'], True) == CDT(['a', 'b'])
-->False
I didn't make any code modifications in regards to hashing, so
CDT(*, None)
will have the same hash asCDT(*, False)
. This seems to be consistent with how equality is treated. Makes the logic implementing equality nicer too, since the case when both dtypes are unordered relies on hashes.