-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add columns attribute to DataFrameGroupBy #53583
Comments
Unfortunately I don't think this is well-defined at this time, although we've been working to fix this.
Would you expect |
@grisaitis: Do you not have access to |
To add to the answer from @rhshadrach , the Neither of the above are part of the public API, IMO it would be beneficial to make the hidden attributes public, for example as attributes of |
@rhshadrach thank you for your feedback and questions.
personally, i would expect the columns of the original DataFrame, regardless of groupby keys. the reason for this is, i'm interested in what columns are available when i apply aggregation functions on each group. In other words, when I do
i do have access to also thank you @topper-123 for that info! |
This makes sense - it would be the same columns that are available for selection after groupby, e.g.
While this currently includes the grouping columns, that will likely not be the case in pandas 3.0. We are looking to deprecate this behavior. See #7155. |
exactly. and i think this would be useful to know sometimes.
i believe in pandas 2 even this is not necessarily true. e.g., if the groupby keys include an index level name:
in short, i think knowing what these columns are can be useful, and also tricky to determine without the original dataframe. |
This is true, but I think it misses the issue I was addressing. You made the statements
The use of "in other words" suggests to me you think these two are equivalent; they are not. In some cases in 2.0.2 they aren't the same (to your point), and in 3.0 there will be even more cases where they are not the same. Where I think we agree is that the columns returned should be that of the original DataFrame, or of the selected columns if selection was used (e.g. I'm personally +0 on this. Assuming we do go forward, I think some care needs to be taken on how it's exposed to the user, including if the attribute can be modified and/or mutated. @topper-123 - do you have any thoughts there. |
Interesting issue here. I can see that there is value in accessing the columns associated with a groupby object. @topper-123 's suggestion to simply make the attributes of Regarding whether to include "all" columns or just the non-index columns: If
Anyway, not sure on the best approach but just thought it was worth mentioning that |
as_index only impacts the result of reducers, not all groupby methods. As such, I don't think we should consider its state for an attribute that isn't specifically about reducers. |
That's a good point. Makes it seem even more clear that maybe the groupby object itself should not have a columns attribute. Having a member of attrs instead feels like the most intuitive implementation, as suggested by @topper-123 but that's just my opinion |
Thanks for everyone's consideration and time here. I'm less convinced now that this is a great idea after reading replies and thinking it over. Open to closing this out. |
@topper-123: Would you include |
I wouldn't, because it will be accessible through |
@rhshadrach, do you have an opinion on #53642? |
Closed as duplicate of #53642. |
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
I wish I could easily inspect the columns present in a DataFrameGroupBy.
Feature Description
I.e., after
dfg = df.groupby([..])
, I wish I could dodfg.columns
such thatdfg.columns == df.columns
.Alternative Solutions
The best I'm aware of is:
cc https://stackoverflow.com/q/76444424/781938
Additional Context
I find myself needing this sometimes when passing around DataFrameGroupBy objects. Maybe this is counterintuitive? Is my coding approach suboptimal here? Just an idea. It seems pretty basic.
The text was updated successfully, but these errors were encountered: