-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DISC: Behavior of .astype('category') on existing categorical data #18790
Comments
I like 3. can you make a quick test to see if its feasible? |
If option 3 isn't too difficult, I think it'd be best. I want `'category'`
to be equivalent to `CategoricalDtype()`.
Currently that's true for creating a new categorical. It'd be nice if it
were true for coercing existing
categoricals.
…On Fri, Dec 15, 2017 at 5:10 AM, Jeff Reback ***@***.***> wrote:
I like 3. can you make a quick test to see if its feasible?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18790 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIi5k8-puN7TCjj8NTYNNg18wijuMks5tAlO1gaJpZM4RDJRe>
.
|
Option 3 doesn't look to be that bad. Should have a PR within the next day or so, depending on my free time. There are only a couple ambiguous points I've encountered:
Regarding equality, my current plan is to treat
This maintains existing comparison behavior when ordered is not specified:
Regarding hashing, without any code modifications |
Background
Follow-up from this specfic chain of comments: #18710 (comment)
And these PR's in general: #18677, #18710
Issue
For the context of this discussion, I'm only referring to data that is already categorical; I don't think there was any ambiguity with converting non-categorical to categorical. This applies using
.astype('category')
onCategorical
,CategoricalIndex
, andSeries
.The crux of the issue comes down to whether
.astype('category')
should ever change data that is already categorical. An argument that it shouldn't is that.astype('category')
doesn't explicitly specify any changes, so nothing should be changed, and it's the existing behavior.The other argument is that
.astype('category')
should be equivalent to.astype(CategoricalDtype())
. Note thatCategoricalDtype()
is the same asCategoricalDtype(categories=None, ordered=False)
:This means that if the existing categorical data is ordered, then
.astype(CategoricalDtype())
would change the categorical data from havingordered=True
toordered=False
, and so.astype('category')
should do the same.I don't think there are any scenarios where the categories themselves would change; the only potential thing that could change is
ordered=True
toordered=False
. See below for a summary of some potential options. Feel free to modify any of the pro/cons listed below, or suggest any other potential options.Option 1:
.astype('category')
does not change anythingThis would not require any additional code changes, as it's the current behavior.
Pros:
.astype('category')
Cons:
.astype(CategoricalDtype())
Option 2:
.astype('category')
changesordered=True
toordered=False
This would require some additional code changes, but is relatively minor.
Pros:
.astype('category')
consistent with.astype(CategoricalDtype())
Cons:
.astype('category')
Option 3: Allow
ordered=None
inCategoricalDtype
Basically, make
CategoricalDtype()
returnCategoricalDtype(categories=None, ordered=None)
. I should preface this by saying that I have not scoped out the amount of code that would need to be changed for this, nor the potential ramifications. This may not be a good idea.Pros:
.astype('category')
.astype('category')
consistent with.astype(CategoricalDtype())
Cons:
CategoricalDtype
The text was updated successfully, but these errors were encountered: