API: Allow ordered=None in CategoricalDtype #18889

jschendel · 2017-12-21T07:27:50Z

closes DISC: Behavior of .astype('category') on existing categorical data #18790
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

For equality comparisons with ordered=None, I essentially treated it as if it where ordered=False:

CDT(['a', 'b'], None) == CDT(['a', 'b'], False) --> True
CDT(['a', 'b'], None) == CDT(['b', 'a'], False) --> True
CDT(['a', 'b'], None) == CDT(['a', 'b'], True) --> False

This maintains existing comparison behavior when ordered is not specified:

CDT(['a', 'b'], False) == CDT(['a', 'b']) --> True
CDT(['a', 'b'], True) == CDT(['a', 'b']) --> False

I didn't make any code modifications in regards to hashing, so CDT(*, None) will have the same hash as CDT(*, False). This seems to be consistent with how equality is treated. Makes the logic implementing equality nicer too, since the case when both dtypes are unordered relies on hashes.

jreback · 2017-12-21T13:59:02Z

pandas/core/dtypes/dtypes.py

@@ -361,11 +359,16 @@ def _update_dtype(self, dtype):
                   'got {dtype!r}').format(dtype=dtype)
            raise ValueError(msg)



separate, I don't think we need to have private methods on CDT. e.g. validate_ordered and update_dtype (maybe more). As this is used in other contexts. so a de-privatize PR should be great.

actually can you change this?

done for _update_dtype, _validate_ordered, and _validate_categories

jreback · 2017-12-21T17:39:24Z

doc/source/whatsnew/v0.22.0.txt

@@ -198,6 +198,7 @@ Other API Changes
 - Rearranged the order of keyword arguments in :func:`read_excel()` to align with :func:`read_csv()` (:issue:`16672`)
 - :func:`pandas.merge` now raises a ``ValueError`` when trying to merge on incompatible data types (:issue:`9780`)
 - :func:`wide_to_long` previously kept numeric-like suffixes as ``object`` dtype. Now they are cast to numeric if possible (:issue:`17627`)
+- The default value of the ``ordered`` parameter for :class:`~pandas.api.types.CategoricalDtype` has changed from ``False`` to ``None``.  Behavior should remain consistent for downstream objects, such as :class:`Categorical` (:issue:`18790`)


rebase on master. can you move to 0.23 (docs were renamed), prob easiest to just check this file from master and past in new one

codecov · 2017-12-22T10:38:42Z

Codecov Report

Merging #18889 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #18889      +/-   ##
==========================================
+ Coverage   91.62%   91.62%   +<.01%     
==========================================
  Files         150      150              
  Lines       48798    48798              
==========================================
+ Hits        44709    44711       +2     
+ Misses       4089     4087       -2

Flag	Coverage Δ
#multiple	`89.99% <100%> (ø)`	⬆️
#single	`41.74% <90%> (+0.01%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/arrays/categorical.py	`94.87% <100%> (ø)`	⬆️
pandas/core/indexes/category.py	`97.26% <100%> (ø)`	⬆️
pandas/core/dtypes/dtypes.py	`96.08% <100%> (ø)`	⬆️
pandas/util/testing.py	`83.85% <0%> (+0.2%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6485a36...007340d. Read the comment docs.

jreback · 2018-01-02T11:27:05Z

pandas/core/dtypes/dtypes.py

-            return other.ordered and self.categories.equals(other.categories)
-        elif other.ordered:
-            return False
+        elif self.ordered or other.ordered:


can ordered be None here?

yes, e.g. CDT(list('abcd'), None) == CDT(list('dcba'), None), but None is fine here since it is Falsey, and this is just meant to catch cases where at least one is True

jreback · 2018-01-02T11:27:23Z

pandas/core/dtypes/dtypes.py

@@ -361,11 +359,16 @@ def _update_dtype(self, dtype):
                   'got {dtype!r}').format(dtype=dtype)
            raise ValueError(msg)



actually can you change this?

jreback · 2018-01-02T11:30:05Z

pandas/tests/dtypes/test_dtypes.py

+        c1 = CategoricalDtype(list('abc'), ordered1)
+        c2 = CategoricalDtype(list('abc'), ordered2)
+        result = c1 == c2
+        expected = (ordered1 is ordered2) or not any([ordered1, ordered2])


give a comment here about what is expected.

added a comment a few lines up, and simplified the logic for expected

TomAugspurger · 2018-01-02T11:46:28Z

doc/source/whatsnew/v0.23.0.txt

@@ -208,6 +208,7 @@ Other API Changes
 - In :func:`read_excel`, the ``comment`` argument is now exposed as a named parameter (:issue:`18735`)
 - Rearranged the order of keyword arguments in :func:`read_excel()` to align with :func:`read_csv()` (:issue:`16672`)
 - The options ``html.border`` and ``mode.use_inf_as_null`` were deprecated in prior versions, these will now show ``FutureWarning`` rather than a ``DeprecationWarning`` (:issue:`19003`)
+- The default value of the ``ordered`` parameter for :class:`~pandas.api.types.CategoricalDtype` has changed from ``False`` to ``None``.  Behavior should remain consistent for downstream objects, such as :class:`Categorical` (:issue:`18790`)


Does your other PR mention why we made this change (updating just categories and not changing ordered)?

I don't think so; updated the whatsnew entry to specify this

TomAugspurger

A few comments inline.

jschendel · 2018-02-07T04:57:33Z

@jreback @TomAugspurger : Sorry, forgot about this! Made the requested changes.

jreback · 2018-02-07T11:34:08Z

doc/source/whatsnew/v0.23.0.txt

@@ -438,6 +438,7 @@ Other API Changes
 - Set operations (union, difference...) on :class:`IntervalIndex` with incompatible index types will now raise a ``TypeError`` rather than a ``ValueError`` (:issue:`19329`)
 - :class:`DateOffset` objects render more simply, e.g. "<DateOffset: days=1>" instead of "<DateOffset: kwds={'days': 1}>" (:issue:`19403`)
 - :func:`pandas.merge` provides a more informative error message when trying to merge on timezone-aware and timezone-naive columns (:issue:`15800`)
+- The default value of the ``ordered`` parameter for :class:`~pandas.api.types.CategoricalDtype` has changed from ``False`` to ``None`` to allow updating of ``categories`` without impacting ``ordered``.  Behavior should remain consistent for downstream objects, such as :class:`Categorical` (:issue:`18790`)


can you make this a subsection

jreback · 2018-02-07T11:36:32Z

pandas/core/dtypes/dtypes.py

-        elif other.ordered:
-            return False
+        elif self.ordered or other.ordered:
+            # at least one ordered


I would expand on the comment here (kind of like you did in the tests below), to enumerate the cases for ordered == ordered (and when they match / don't match)

jreback · 2018-02-10T17:03:15Z

thanks @jschendel as always very nice patches! keep coming!

IF you are interested in working on some of the new interval indexing things (the _new test files), would be great.

jreback added the Categorical Categorical Data Type label Dec 21, 2017

jreback reviewed Dec 21, 2017

View reviewed changes

jreback requested changes Dec 21, 2017

View reviewed changes

jschendel force-pushed the cdt-ordered-none branch from a981daf to 289defa Compare December 21, 2017 23:32

jschendel force-pushed the cdt-ordered-none branch 2 times, most recently from e312d24 to cf6107e Compare January 1, 2018 18:46

jreback added the API Design label Jan 2, 2018

jreback requested changes Jan 2, 2018

View reviewed changes

TomAugspurger reviewed Jan 2, 2018

View reviewed changes

jschendel force-pushed the cdt-ordered-none branch from cf6107e to 928ce5e Compare February 7, 2018 01:41

jreback requested changes Feb 7, 2018

View reviewed changes

API: Allow ordered=None in CategoricalDtype

71f5b2a

jschendel force-pushed the cdt-ordered-none branch from 928ce5e to 7e18be0 Compare February 10, 2018 01:19

jschendel added 3 commits February 9, 2018 19:18

deprivatize

1dceb04

additional review edits

e2752a8

expand whatsnew and comments

007340d

jschendel force-pushed the cdt-ordered-none branch from 7e18be0 to 007340d Compare February 10, 2018 02:18

jreback added this to the 0.23.0 milestone Feb 10, 2018

jreback approved these changes Feb 10, 2018

View reviewed changes

jreback merged commit fe972fb into pandas-dev:master Feb 10, 2018

harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018

API: Allow ordered=None in CategoricalDtype (pandas-dev#18889)

8488572

jschendel deleted the cdt-ordered-none branch September 24, 2018 17:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: Allow ordered=None in CategoricalDtype #18889

API: Allow ordered=None in CategoricalDtype #18889

jschendel commented Dec 21, 2017

jreback Dec 21, 2017

jreback Jan 2, 2018

jschendel Feb 7, 2018

jreback Dec 21, 2017

codecov bot commented Dec 22, 2017 •

edited

Loading

jreback Jan 2, 2018

jschendel Feb 7, 2018

jreback Jan 2, 2018

jreback Jan 2, 2018

jschendel Feb 7, 2018

TomAugspurger Jan 2, 2018

jschendel Feb 7, 2018 •

edited

Loading

TomAugspurger left a comment

jschendel commented Feb 7, 2018

jreback Feb 7, 2018

jreback Feb 7, 2018

jreback commented Feb 10, 2018

		@@ -361,11 +359,16 @@ def _update_dtype(self, dtype):
		'got {dtype!r}').format(dtype=dtype)
		raise ValueError(msg)

API: Allow ordered=None in CategoricalDtype #18889

API: Allow ordered=None in CategoricalDtype #18889

Conversation

jschendel commented Dec 21, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Dec 22, 2017 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jschendel Feb 7, 2018 • edited Loading

Choose a reason for hiding this comment

TomAugspurger left a comment

Choose a reason for hiding this comment

jschendel commented Feb 7, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Feb 10, 2018

codecov bot commented Dec 22, 2017 •

edited

Loading

jschendel Feb 7, 2018 •

edited

Loading