-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: preferred way to check if column/Series has Categorical dtype #8814
Comments
That was me asking the question. I originally started writing up the question because I was working on a PR for pandas, and while writing I discovered Having something that's not so deeply buried would be good though. I tried |
It does looks like |
@shoyer as you can see in my answer on SO, it does indeed work, but the problem is it raises for other dtypes instead of giving False, which is not very handy (and that is a numpy thing).
So exactly because that is not working (as you would expect: returning False), I think we should provide a common way to do this (or at least document this in the categorical docs what is the best way to do this) |
Ah, I see. A reasonable solution might be to wrap the dtype in |
cc @JanSchulz |
So the preferred method of 'cheking' dtypes is simply to use So I don't think it necessary to have the user actually use anything internal. If pressed, would not suggestion any mention/use of To be honest this rarely should if ever come up. If the OP is trying to check individual dtypes for category then this is the wrong approach (and mostly certainly So if someone wants to add a small doc section, ok. |
Shouldn't this work?
|
that will raise for non-cat types (as does prevents user error. I think this is correct (these should raise) |
@JanSchulz no, because it gives a TypeError instead of False if it is not a categorical:
|
I think is an issue that should be raised to numpy
closing on pandas side as this is sane on the pandas side |
well, for all who care, I tried to push upstream. The user is now subject to random numpyisms that are really hard to fix downstream (impossible in this case). |
The string coercion for dtype equality is is an ugly API, and @njsmith is right that we are probably misusing the API here with So it think it would indeed be better to do this differently. Perhaps Either way, I think this should probably change evaluate to False (because the categories are different):
|
If the the last example should work (I think that was discussed during the design of Categoricals), then we have to put the |
@shoyer I disagree. DyND does support categorical as a full-fledged datashape (see here. But using that impl is prob a ways away.
But to be honest its a fair amount of complexity and not sure how much gain from that. I am not sure anything is actually gained from explicty type checking with a |
@jreback It's one thing for pandas to take a pragmatic approach instead of waiting for a full solution, but designing an API that is incompatible with that full solution seems like a bad idea. In any case, perhaps it was premature to close this issue? (I do agree it's probably not worth refining |
@shoyer I'll buy that This is closed because pandas has does all it can to facilitate Changing to use DyND type system will likely cause a bit of pain all around (good pain though). And will have to be revisited when DyND is more of a fixture. If you have a better API idea which doesn't break anything, all ears. |
The reason I think this should not be closed already is the reason I initially opened this issue: just to document this issue in the categorical.rst docs. So I can do quick PR to include this in the docs, but therefore, just make a quick choice what I put in there:
Or provide this |
well, neither of those are preferred at all
are the most correct ways to do this of course amzing that this works!
|
going to bump this |
Another option (see #9629) is that the preferred way to check if a series is categorical should be |
From http://stackoverflow.com/questions/26924904/check-if-dataframe-column-is-categorical/26925340#26925340
What is the preferred way to check for categorical dtype?
I now answered:
But:
The text was updated successfully, but these errors were encountered: