Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Preserve Series/DataFrame subclasses through groupby operations #33884
BUG: Preserve Series/DataFrame subclasses through groupby operations #33884
Changes from 35 commits
7da0703
f6e1a89
ee6de43
3f9f4c4
0112826
422e702
53ac397
2bc2520
8d9a885
e4d7fa8
1dbe986
c998422
abdb861
d36ad6d
5b83062
b6ea731
0cdf0ea
6e48e07
a70c21a
c03d459
9e42c79
9fbc645
5750d72
7f4c5a7
8eee73c
4b304c1
b3e039a
b1118de
a490e38
5bcf9fa
0244b36
dcd4692
a92c51b
cf3b978
f08cf59
d2a7de2
37ea97f
f1570da
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm this should be FrameOrSeries I think.
why is there an else here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought this should be a Series because if
self.obj
is a DataFrame, it returnsself.obj._constructor_sliced
. Do you think I should name this property_series_constructor
or something comparable to make that behavior more apparent?The else was there because
mypy
was complaining about a missing return statement. I can restructure this with an assertion to avoid an else statement and keepmypy
from complaining:Please let me know if you have a better way to structure this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes pls use an assertion
maybe @simonjayhawkins or @WillAyd can help with the annotation itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks -- I changed
_constructor
to use the assertion as above. I think it would also make sense to change the name of the property to_series_constructor
in order to clarify the return type, but I'll hold off for additional comments.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the annotation is correct given the way it is implemented, though I am not sure about the implementation. Why do we need to dispatch to
constructor_sliced
for DataFrames? Seems slightly unnatural to have to force thatThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ends up being used in
GroupBy.ngroup()
,GroupBy.cumcount()
, andGroupBy.size()
. Prior to these changes, these methods had returnedSeries
, regardless of whether they were called from aSeriesGroupBy
orDataFrameGroupBy
object. I had updated this to still return aSeries
-type while preserving the subclasses -- to avoid things reverting back topd.Series
if they were called from a subclassed DataFrame/Series.As such, the motivation for dispatching to
constructor_sliced
for DataFrames was to avoid changing the return type for these differentGroupBy
methods from their prior behavior. Do you think it would make sense to make a larger change here that alters these functions to have different return types if called fromSeriesGroupBy
vs.DataFrameGroupBy
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm that's unfortunate... Can you rename this property to
_obj_constructor
instead? I find the current name a little confusingThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe even
_obj_1d_constructor
to be even more explicit. This is definitely for special cases so want to signal as suchThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah -- I think
_obj_1d_constructor
is most clear. I'll make the changes.