-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pd.Categorial and level ordering and adding to a DataFrame #6242
Comments
see related #6219
I don't think this is too difficult, but would help if you can put some examples of what you can do if this was the case. E.g. pretend that you can add a categorical as a Series (e.g. has dtype='categorical'); can you show some operations on a frame that make sense for that? or does this make sense at all? |
What I want to do is this:
Also:
Also nice: df.groupby("categorical_variable").size()
# -> Would give 0 for levels which are not in the dataframe
|
can you show what a sample frame (include also a float as well) looks like? |
@jreback ? [Edit: changed
Cut and #3943 would also be a candidate And when I see #3678
|
what does the factor look like when you print the frame, e.g.
e.g. if you print out |
I would think it look the same as if you would print out |
yep....do we want to close this (and just ref from #5313) ? |
I think this makes sense. I add a comment there about the "what should work" thingies above. |
@JanSchulz just saw your examples.....let me take a look |
As part of yhat/ggpy#188 I'm trying to build a
factor
function which works the same way as the one in R (e.g. https://www.stat.berkeley.edu/classes/s133/factors.html) and which I could use to store factors in a dataframe.What currently does not work is ordering:
Currently I see no way to make a change to the ordering of the levels to a predefined list.
[The problem here seems to be that
pd.Categorial.__init__()
expects the arguments to be already transformed to a factor data structure (list of index positions + labels) and the above is actually a "labels + label ordering" structure]pd.Categorial.from_array()
only takes adata
argument, but no ordering :-(It seems that
pandas.core.algorithms.factorize()
was once intended to do that, but theorder
keyword is not used in the actual implementation (https://github.com/pydata/pandas/blob/master/pandas/core/algorithms.py#L116).I'm not sure what I should do with
ordered =[True, False]
. What I think it should do is to remove some standard ordering operations in case ofordered == False
?Another problem is that i can't roundtrip factors into a dataframe:
-> There is no level attribute anymore :-(
The text was updated successfully, but these errors were encountered: