DEPR: Deprecate ordered=None for CategoricalDtype #26403

jschendel · 2019-05-15T06:27:48Z

closes Deprecate ordered=None for CategoricalDtype #26336
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

All of the logic related to breaking changes looks to be consolidated in CategoricalDtype.update_dtype, so I'm raising the warning there when a breaking change would occur and letting it flow through from there.

Aside from CategoricalDtype.update_dtype, I think the only other place where a breaking change can manifest is in astype as described in the issue. Will need to double check the test logs to see if anything else triggered a warning though.

cc @TomAugspurger @jorisvandenbossche

codecov · 2019-05-15T07:29:15Z

Codecov Report

Merging #26403 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #26403      +/-   ##
==========================================
- Coverage   91.69%   91.68%   -0.01%     
==========================================
  Files         174      174              
  Lines       50743    50746       +3     
==========================================
- Hits        46527    46526       -1     
- Misses       4216     4220       +4

Flag	Coverage Δ
#multiple	`90.19% <100%> (ø)`	⬆️
#single	`41.17% <33.33%> (-0.16%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/dtypes/dtypes.py	`96.69% <100%> (+0.02%)`	⬆️
pandas/io/gbq.py	`78.94% <0%> (-10.53%)`	⬇️
pandas/core/frame.py	`97.02% <0%> (-0.12%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b24fb6...adc0bca. Read the comment docs.

codecov · 2019-05-15T07:29:19Z

Codecov Report

❗ No coverage uploaded for pull request base (master@ddec4eb). Click here to learn what that means.
The diff coverage is 28.57%.

@@            Coverage Diff            @@
##             master   #26403   +/-   ##
=========================================
  Coverage          ?   41.69%           
=========================================
  Files             ?      174           
  Lines             ?    50760           
  Branches          ?        0           
=========================================
  Hits              ?    21162           
  Misses            ?    29598           
  Partials          ?        0

Flag	Coverage Δ
#single	`41.69% <28.57%> (?)`

Impacted Files	Coverage Δ
pandas/io/packers.py	`14.57% <0%> (ø)`
pandas/core/dtypes/dtypes.py	`73.62% <33.33%> (ø)`
pandas/core/series.py	`45.76% <50%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ddec4eb...bc16e98. Read the comment docs.

jorisvandenbossche

That's a clean change!

The one case that is not covered (which I mentioned in the issue) is when passing it manually to the constructor:

CategoricalDtype(categories=[..], ordered=None)

assuming we would disallow that in the future?

jorisvandenbossche · 2019-05-15T11:53:39Z

pandas/core/dtypes/dtypes.py

+                msg = ("ordered=None is deprecated and will default to False "
+                       "in a future version; ordered=True must be explicitly "
+                       "passed in order to be retained")
+                warnings.warn(msg, FutureWarning, stacklevel=2)


Maybe we should let the stacklevel fit for the astype case (although the series and index case might need a different stacklevel), as that seems the more common case compared to directly using this method?

Or, otherwise, I would certainly try to make the context clearer in this warning message: indicate that a CategoricalDtype was constructed without specifying the ordered, etc (otherwise the message might be very confusing, as it is not raised when creating it)

I've updated the stacklevel so that it works for the Categorical.astype and CategoricalIndex.astype cases; all other cases look like they require a unique stacklevel. I've also updated the message itself to be more clear.

jorisvandenbossche · 2019-05-15T11:56:30Z

[about passing ordered=None explicitly] assuming we would disallow that in the future?

Actually, we could still accept it but turn it into False?

jreback

do we have any tests which are producing this warning that are not caught?

TomAugspurger · 2019-05-15T14:02:27Z

One slight concern, In the future, CategoricalDtype().ordered will change without warning, right? Right now it's None, in the future it'll be False.

I'm not sure how we could easily raise a warning there (on attribute access) without being very annoying.

jorisvandenbossche · 2019-05-20T19:06:23Z

One slight concern, In the future, CategoricalDtype().ordered will change without warning, right? Right now it's None, in the future it'll be False.

One option might to be already change it now? (to prevent more users to start relying on it) We could already return False when the user asks for it and internally None was stored?

…none

jschendel · 2019-05-21T05:54:50Z

do we have any tests which are producing this warning that are not caught?

Yes, this is the reason CI failed on my first commit. The warning came from a case we weren't currently addressing, which I detailed in #26336 (comment)

The one case that is not covered (which I mentioned in the issue) is when passing it manually to the constructor

We do this a lot internally in the codebase as well, so it seems like it'd be hard to do this is a non-invasive way that doesn't break things. Will look into this in the next couple days, likewise with CategoricalDtype().ordered.

pandas/core/generic.py

…none

jreback · 2019-06-08T22:08:27Z

can you merge master

…none

jschendel · 2019-07-01T17:17:38Z

Updated to address remaining comments from the issue, as well as align on comments from the sprint.

Summary of the issue to refresh people's memory

Deprecated CategoricalDtype default value ordered=None in favor of ordered=False
Still allow ordered=None if explicitly passed, in part to maintain behavior for the string 'category'.

Summary of impacted behavior

astype behavior could silently change since ordered=None can be used to maintain the existing ordered (xref Deprecate ordered=None for CategoricalDtype #26336 (comment))
Series constructor behavior could silently change, as ordered=None can be used to maintain the existing ordered of a Categorical (xref Deprecate ordered=None for CategoricalDtype #26336 (comment))
CategoricalDtype.ordered could silently change from None --> False if the default is changed to False (xref DEPR: Deprecate ordered=None for CategoricalDtype #26403 (comment))
- Should not impact Categorical.ordered or CategoricalIndex.ordered as None should be resolved to a boolean during construction.

Implementation comments
I tried to implement the changes in the least intrusive way; a warning will only be raised in scenarios where the existing behavior would silently change.

For example, with CategoricalDtype.ordered, the warning will only appear if no value for None is specified. Note that constructing a CategoricalDtype without specifying None will not warn, as the behavior may not be silently changing at this point; only upon accessing .ordered will a warning appear.

In [1]: from pandas.api.types import CategoricalDtype

In [2]: cdt = CategoricalDtype(['a', 'b'])

In [3]: cdt.ordered
/home/jeremy/code/pandas/pandas/core/dtypes/dtypes.py:589: FutureWarning: Constructing a
CategoricalDtype without specifying `ordered` will default to `ordered=False` in a future
version; `ordered=None` must be explicitly passed.
  warnings.warn(msg, FutureWarning)

Explicitly passing None will not warn when .ordered is accessed, since this behavior will still be allowed. Likewise explicitly passing True or False will not warn.

In [4]: CategoricalDtype(['a', 'b'], ordered=None).ordered is None
Out[4]: True

In [5]: CategoricalDtype(['a', 'b'], ordered=True).ordered
Out[5]: True

In [6]: CategoricalDtype(['a', 'b'], ordered=False).ordered
Out[6]: False

jschendel · 2019-07-01T17:27:19Z

pandas/core/dtypes/dtypes.py

            self.validate_ordered(ordered)

        if categories is not None:
            categories = self.validate_categories(categories,
                                                  fastpath=fastpath)

        self._categories = categories
-        self._ordered = ordered
+        self._ordered = ordered if ordered is not sentinel else None
+        self._ordered_from_sentinel = ordered is sentinel


This is a little bit gross but seemed to be the cleanest way to implement things in a non-intrusive way (i.e. only warning when behavior would silently change).

Internal code can't reference .ordered because it could trigger the warning due to under the hood operations that the user has no control over (e.g. handling 'category' as dtype), so internal code needs to reference ._ordered. Instead of special casing sentinel every time ._ordered is referenced seemed cleaner to add an additional attribute to keep track of things.

I tried to change the internals so that .ordered didn't trigger the warning, but I wasn't successfully able to do so, and my failing attempt had already gotten more complex than the .ordered --> ._ordered change, so didn't seem worthwhile to pursue further.

Note that the .ordered --> ._ordered change only applies to CategoricalDtype; this change does not appear to be needed to Categorical or CategoricalIndex, as those will always resolve to a boolean.

I'll volunteer to remove this and revert _ordered to ordered when we go through with changing the default value in the constructor.

I'll volunteer to remove this and revert _ordered to ordered when we go through with changing the default value in the constructor.

Showtime!

jorisvandenbossche · 2019-07-01T18:10:38Z

@jschendel Thanks for the detailed and clear summary!
On board with the proposed changes and implementation details

TomAugspurger · 2019-07-01T18:13:01Z

Thanks for the summary in #26403 (comment). That all sounds good, haven't had a chance to look at the changes yet. Probably won't today.

jorisvandenbossche

Looks good, some small questions

pandas/core/dtypes/dtypes.py

jorisvandenbossche · 2019-07-01T18:14:10Z

pandas/core/dtypes/dtypes.py

+                msg = ("Constructing a CategoricalDtype without specifying "
+                       "`ordered` will default to `ordered=False` in a future "
+                       "version; `ordered=True` must be explicitly passed in "
+                       "order to be retained")


Should we explain what the use case is here? (maybe we don't really want to encourage using it ..)

added some more detail to the message, but agree that we don't really want to encourage using ordered=None so I didn't specifically mention it

pandas/tests/dtypes/test_dtypes.py

jschendel · 2019-07-01T20:14:16Z

doc/source/whatsnew/v0.23.0.rst

+    [a, b, c, a, b, a]
+    Categories (3, object): [c < b < a]
+
+    In [5]: cdt = CategoricalDtype(categories=list('cbad'))


Looks like an old whatsnew note references behavior that will be changed, so switched over to an ipython code block to keep the previous behavior static when this gets generated in the future

jreback · 2019-07-02T00:13:40Z

pandas/core/arrays/categorical.py

@@ -331,7 +331,7 @@ def __init__(self, values, categories=None, ordered=None, dtype=None,
        # sanitize input
        if is_categorical_dtype(values):
            if dtype.categories is None:
-                dtype = CategoricalDtype(values.categories, dtype.ordered)
+                dtype = CategoricalDtype(values.categories, dtype._ordered)


we really need to user the internal one here? (and all others); I really don't want to expose that even to our code.

See the overview that Jeremy gave above (#26403 (comment)) and this comment for more details on why _ordered was needed: #26403 (comment)
That explains clearly the context on why this was added, and I am fine with it.

ok that's fine then, @jschendel can you create an issue for this so we don't forget at removal time.

sure, back at work today so will create the issue later on tonight

jreback · 2019-07-02T00:16:10Z

pandas/tests/dtypes/test_dtypes.py

            expected_ordered = dtype.ordered

-        result = dtype.update_dtype(new_dtype)
+        # GH 26336


can you give some explanation on what you are testing here (the cases)

jreback · 2019-07-02T18:46:31Z

this looks fine. @jschendel can you rebase.

jreback · 2019-07-02T18:48:06Z

actually was able to use the conflict resolver. merge on green.

DEPR: Deprecate ordered=None for CategoricalDtype

adc0bca

jschendel added Dtype Conversions Unexpected or buggy dtype conversions Categorical Categorical Data Type Deprecate Functionality to remove in pandas labels May 15, 2019

jschendel added this to the 0.25.0 milestone May 15, 2019

jsexauer mentioned this pull request May 15, 2019

DEPR: Clean up list of deprecations from prior versions #6581

Closed

1 task

jorisvandenbossche reviewed May 15, 2019

View reviewed changes

jreback requested changes May 15, 2019

View reviewed changes

Merge remote-tracking branch 'upstream/master' into depr-cdt-ordered-…

4ad16c9

…none

cover additional case and review edits

2da967d

jschendel commented May 21, 2019

View reviewed changes

pandas/core/generic.py Outdated Show resolved Hide resolved

jschendel added 4 commits May 21, 2019 00:02

fix whatsnew

b03eadd

update warning message

77f171d

consolidate special casing

cf12a1f

Merge remote-tracking branch 'upstream/master' into depr-cdt-ordered-…

060eb08

…none

jschendel force-pushed the depr-cdt-ordered-none branch from a1db25d to 060eb08 Compare May 21, 2019 20:40

jschendel added 2 commits June 30, 2019 11:13

Merge remote-tracking branch 'upstream/master' into depr-cdt-ordered-…

c9c2333

…none

.ordered warning and keep None as non-default

bc16e98

jschendel commented Jul 1, 2019

View reviewed changes

jorisvandenbossche reviewed Jul 1, 2019

View reviewed changes

jschendel added 3 commits July 1, 2019 12:58

fix typing and docs

99b6a30

sentinel --> ordered_sentinel

d999bd8

add more detail to warning

fdb5770

jschendel commented Jul 1, 2019

View reviewed changes

jreback requested changes Jul 2, 2019

View reviewed changes

jreback approved these changes Jul 2, 2019

View reviewed changes

Merge branch 'master' into depr-cdt-ordered-none

c45f159

Merge branch 'master' into PR_TOOL_MERGE_PR_26403

219f03c

jschendel merged commit 8393e37 into pandas-dev:master Jul 3, 2019

jschendel deleted the depr-cdt-ordered-none branch July 3, 2019 08:26

jschendel mentioned this pull request Jul 3, 2019

CLN: Revert internal CategoricalDtype.ordered -> CategoricalDtype._ordered references #27203

Closed

qwhelan mentioned this pull request Jul 18, 2019

PERF: restore performance for unsorted CategoricalDtype comparison #27448

Merged

5 tasks

jschendel mentioned this pull request Dec 2, 2019

DEPR: Change default value for CategoricalDtype.ordered from None to False #29955

Merged

4 tasks

jreback mentioned this pull request Dec 9, 2019

DEPR: deprecations log for removed issues #13777

Closed

eriknw mentioned this pull request Aug 9, 2022

[BUG] categoricals with .ordered is None in cudf and dask_cudf rapidsai/cudf#11487

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEPR: Deprecate ordered=None for CategoricalDtype #26403

DEPR: Deprecate ordered=None for CategoricalDtype #26403

jschendel commented May 15, 2019

codecov bot commented May 15, 2019

codecov bot commented May 15, 2019 •

edited

Loading

jorisvandenbossche left a comment

jorisvandenbossche May 15, 2019

jschendel May 21, 2019

jorisvandenbossche commented May 15, 2019

jreback left a comment

TomAugspurger commented May 15, 2019

jorisvandenbossche commented May 20, 2019

jschendel commented May 21, 2019

jreback commented Jun 8, 2019

jschendel commented Jul 1, 2019

jschendel Jul 1, 2019 •

edited

Loading

jbrockmendel Nov 28, 2019

jorisvandenbossche commented Jul 1, 2019

TomAugspurger commented Jul 1, 2019

jorisvandenbossche left a comment

jorisvandenbossche Jul 1, 2019

jschendel Jul 1, 2019

jschendel Jul 1, 2019

jreback Jul 2, 2019

jorisvandenbossche Jul 2, 2019

jreback Jul 2, 2019

jschendel Jul 2, 2019

jreback Jul 2, 2019

jreback commented Jul 2, 2019

jreback commented Jul 2, 2019

DEPR: Deprecate ordered=None for CategoricalDtype #26403

DEPR: Deprecate ordered=None for CategoricalDtype #26403

Conversation

jschendel commented May 15, 2019

codecov bot commented May 15, 2019

Codecov Report

codecov bot commented May 15, 2019 • edited Loading

Codecov Report

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented May 15, 2019

jreback left a comment

Choose a reason for hiding this comment

TomAugspurger commented May 15, 2019

jorisvandenbossche commented May 20, 2019

jschendel commented May 21, 2019

jreback commented Jun 8, 2019

jschendel commented Jul 1, 2019

jschendel Jul 1, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Jul 1, 2019

TomAugspurger commented Jul 1, 2019

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jul 2, 2019

jreback commented Jul 2, 2019

codecov bot commented May 15, 2019 •

edited

Loading

jschendel Jul 1, 2019 •

edited

Loading