Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Add dtype parameter to Categorical.from_codes #24398

Merged
merged 10 commits into from
Jan 8, 2019

Conversation

topper-123
Copy link
Contributor

  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Added parameter dtype to Categorical.from_codes.

@topper-123 topper-123 force-pushed the Categorical.from_codes branch from b026e50 to e2543df Compare December 22, 2018 20:55
@codecov
Copy link

codecov bot commented Dec 22, 2018

Codecov Report

Merging #24398 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24398      +/-   ##
==========================================
+ Coverage   92.37%   92.38%   +<.01%     
==========================================
  Files         166      166              
  Lines       52315    52323       +8     
==========================================
+ Hits        48327    48337      +10     
+ Misses       3988     3986       -2
Flag Coverage Δ
#multiple 90.8% <100%> (ø) ⬆️
#single 43.06% <55.55%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/categorical.py 95.67% <100%> (+0.24%) ⬆️
pandas/core/indexes/category.py 98.61% <100%> (ø) ⬆️
pandas/core/arrays/datetimelike.py 97.67% <0%> (-0.19%) ⬇️
pandas/io/formats/html.py 99.34% <0%> (ø) ⬆️
pandas/io/formats/format.py 97.98% <0%> (ø) ⬆️
pandas/core/indexes/datetimes.py 96.27% <0%> (+0.01%) ⬆️
pandas/core/arrays/datetimes.py 97.71% <0%> (+0.03%) ⬆️
pandas/util/testing.py 88.09% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be406f3...0459ad0. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 22, 2018

Codecov Report

Merging #24398 into master will increase coverage by <.01%.
The diff coverage is 88.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24398      +/-   ##
==========================================
+ Coverage    92.3%    92.3%   +<.01%     
==========================================
  Files         162      162              
  Lines       51875    51874       -1     
==========================================
+ Hits        47883    47884       +1     
+ Misses       3992     3990       -2
Flag Coverage Δ
#multiple 90.71% <88.88%> (ø) ⬆️
#single 42.99% <77.77%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/category.py 98.65% <100%> (ø) ⬆️
pandas/core/arrays/categorical.py 95.44% <87.5%> (+0.12%) ⬆️
pandas/util/testing.py 87.84% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3e0358d...e2543df. Read the comment docs.

@jschendel jschendel added API Design Categorical Categorical Data Type labels Dec 22, 2018
@jschendel jschendel added this to the 0.24.0 milestone Dec 22, 2018
if dtype is not None:
if categories is not None or ordered is not None:
raise ValueError("Cannot specify both `dtype` and `categories`"
" or `ordered`.")
Copy link
Member

@gfyoung gfyoung Dec 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message confuses me. Are you saying: both "dtype" and ("categories" / "ordered") ?

I think this will need to be reworded for clarity.

Copy link
Contributor Author

@topper-123 topper-123 Dec 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied that message from Categorical.__init__, but I agree, and have changed it in both locations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic. Thanks for doing that!

@topper-123 topper-123 force-pushed the Categorical.from_codes branch 2 times, most recently from fcf731b to c790639 Compare December 23, 2018 09:00
@pep8speaks
Copy link

pep8speaks commented Dec 23, 2018

Hello @topper-123! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on January 08, 2019 at 14:54 Hours UTC

pandas/core/arrays/categorical.py Outdated Show resolved Hide resolved
pandas/core/arrays/categorical.py Outdated Show resolved Hide resolved
@topper-123 topper-123 force-pushed the Categorical.from_codes branch 5 times, most recently from 096c7a5 to 229b474 Compare December 24, 2018 12:44
pandas/core/arrays/categorical.py Outdated Show resolved Hide resolved
pandas/core/indexes/multi.py Outdated Show resolved Hide resolved
pandas/core/groupby/grouper.py Outdated Show resolved Hide resolved
@@ -59,6 +59,7 @@
Categorical, CategoricalIndex, DataFrame, DatetimeIndex, Float64Index,
Index, Int64Index, Interval, IntervalIndex, MultiIndex, NaT, Panel, Period,
PeriodIndex, RangeIndex, Series, TimedeltaIndex, Timestamp)
from pandas.api.types import CategoricalDtype as CDT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same style

@@ -30,6 +30,7 @@
DataFrame, DatetimeIndex, Index, Int64Index, MultiIndex, Panel,
PeriodIndex, Series, SparseDataFrame, SparseSeries, TimedeltaIndex, compat,
concat, isna, to_datetime)
from pandas.api.types import CategoricalDtype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

pandas/tests/arrays/categorical/test_constructors.py Outdated Show resolved Hide resolved
pandas/tests/arrays/categorical/test_constructors.py Outdated Show resolved Hide resolved
pandas/tests/arrays/categorical/test_subclass.py Outdated Show resolved Hide resolved
@jreback
Copy link
Contributor

jreback commented Dec 26, 2018

merge master

@jreback
Copy link
Contributor

jreback commented Dec 28, 2018

@topper-123 can you merge master and update

@topper-123 topper-123 force-pushed the Categorical.from_codes branch 2 times, most recently from d6d3f81 to 8a6ec5d Compare December 30, 2018 13:50
@topper-123
Copy link
Contributor Author

Ok, i’ve reverted the deprecation.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing it, but did you add a test for Categorical.from_codes(codes, categories, dtype=dtype) raising?

pandas/core/arrays/categorical.py Show resolved Hide resolved
pandas/core/arrays/categorical.py Outdated Show resolved Hide resolved
pandas/core/arrays/categorical.py Outdated Show resolved Hide resolved
Categorical.from_codes([0, 1], Categorical(['a', 'b', 'a']))
codes = np.random.choice([0, 1], 5, p=[0.9, 0.1])
dtype = CategoricalDtype(categories=["train", "test"])
Categorical.from_codes(codes, dtype=dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this test is duplicative with earlier ones (even on master). I'd be OK with removing it.

@jorisvandenbossche
Copy link
Member

Yes, +1 on not deprecating categories and ordered

@jreback
Copy link
Contributor

jreback commented Jan 8, 2019

so ok with adding dtype in .from_codes as that promotes consistency, but why are folks not in favor of deprcating categories and ordered? this is just moving code away from the single point of using CDT for all operations.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jan 8, 2019 via email

@jorisvandenbossche
Copy link
Member

so ok with adding dtype in .from_codes as that promotes consistency, but why are folks not in favor of deprcating categories and ordered?

We have exactly the same pattern in the main Categorical constructor as well: users have the option to just pass categories or ordered, or a full fledged dtype.
Deprecating it there of course has a much larger impact than for from_codes, but this means we have the machinery to handle the combination, so I don't really see the need to deprecate it here (certainly given that it is much more convenient to use).

@jreback
Copy link
Contributor

jreback commented Jan 8, 2019

We have exactly the same pattern in the main Categorical constructor as well: users have the option to just pass categories or ordered, or a full fledged dtype.

ok I can see the argument for this then. But this is a tag confusing, maybe let's enhance the doc-strings slightly on the constructor & from_codes to make this even more cclear that you should pass (categories, ordered) or dtype (yes it errors, but a doc-string not will help).

@topper-123 can you raise an issue / PR for this.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments and @TomAugspurger has some corrections

doc/source/whatsnew/v0.24.0.rst Outdated Show resolved Hide resolved
@TomAugspurger
Copy link
Contributor

6008c08 has some changes

  • added a test for raising when both categories / ordered & dtype are passed
  • updated whatsnew
  • updated versionchanged / added docs

@TomAugspurger
Copy link
Contributor

Buglet when neither categories nor dtype is provided.

In [1]: import pandas as pd

In [2]: pd.Categorical.from_codes([0, 1])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-e8a6a967ddf5> in <module>
----> 1 pd.Categorical.from_codes([0, 1])

~/sandbox/pandas/pandas/core/arrays/categorical.py in from_codes(cls, codes, categories, ordered, dtype)
    661
    662         if len(codes) and (
--> 663                 codes.max() >= len(dtype.categories) or codes.min() < -1):
    664             raise ValueError("codes need to be between -1 and "
    665                              "len(categories)-1")

TypeError: object of type 'NoneType' has no len()

fixing now.

* Bug in test not using from_codes
* Raise when neither provided
@TomAugspurger
Copy link
Contributor

6008c08 also had a bug with the tests for raising when both categories and dtype were used. The test I added used Categorical() instead of Categorical.from_codes. That's fixed now.

@topper-123
Copy link
Contributor Author

+1 to the changes made by @TomAugspurger

@TomAugspurger
Copy link
Contributor

Thanks. Merging in a few hours if now objections.

@jreback
Copy link
Contributor

jreback commented Jan 8, 2019

yeah this is ok.

@TomAugspurger TomAugspurger merged commit 2897fca into pandas-dev:master Jan 8, 2019
@topper-123 topper-123 deleted the Categorical.from_codes branch January 8, 2019 19:56
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
@jbrockmendel
Copy link
Member

#6581 lists the categorical and ordered kwargs as deprecated, but i dont see any warnings to that effect. Am i missing something?

@jorisvandenbossche
Copy link
Member

That seems a mistake indeed, as those arguments are not deprecated. Looking at the last part of the comments above, there was some discussion about this but finally decided to not deprecate.

Removed it from the list in #6581

@topper-123
Copy link
Contributor Author

Yeah, the original proposal was to deprecate, but we decided to keep them after all, as it we felt it was more convenient to allow a similar instantiation method as in __init__.

@jbrockmendel
Copy link
Member

jbrockmendel commented Nov 29, 2019

@topper-123 thanks for clarifying. Does this mean that either a) an entry can be removed from #6581 or b) some FutureWarnings can be removed from CategoricalDtype/Categorical? If so, are you up for taking point on that?

@jorisvandenbossche
Copy link
Member

@jbrockmendel as I said above, I already removed it from #6581

There were no warnings introduced in this PR, so also nothing to remove (there are other categorical related ones though, but not related to this PR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Categorical Categorical Data Type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants