-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix SeriesGroupBy.quantile for nullable integers #33138
Conversation
pandas/core/groupby/groupby.py
Outdated
@@ -2241,6 +2242,10 @@ def _get_cythonized_result( | |||
for idx, obj in enumerate(self._iterate_slices()): | |||
name = obj.name | |||
values = obj._values | |||
if is_extension_array_dtype(values.dtype) and is_integer_dtype( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a more generic way to handle this? @jbrockmendel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agree this seems like an overly specific check that probably isn't catching other bugs. I tried just casting any ExtensionArray but got bunches of test failures (I think they had to do with datetimes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm this definitely seems not-great. i think to do this right we need some EA method that gives the ndarray to use when we get here, and maybe a way to round-trip the results if the method is dtype-preserving.
Am I right in thinking that dt64tz (and maybe period?) get cast to i8 somewhere around here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this again I think it might make sense to update the pre_processor
function: https://github.com/pandas-dev/pandas/blob/master/pandas/core/groupby/groupby.py#L1857 . Does that look right to you @jbrockmendel ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that seems plausible. im hoping there's a way to do this with values_for_argsort that can be shared by EAs, and by other order-based methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would that handle NAs in a way that would still give the right quantiles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if you quantile BooleanDtype is another test case
@@ -1861,9 +1863,13 @@ def pre_processor(vals: np.ndarray) -> Tuple[np.ndarray, Optional[Type]]: | |||
) | |||
|
|||
inference = None | |||
if is_integer_dtype(vals): | |||
if is_integer_dtype(vals.dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fine for here, but I think we need a generic 'convert_this_to_something_we_can_use_in_cython)` method on EA. @jbrockmendel @jorisvandenbossche @TomAugspurger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed here, i think the EA.convert_this_to_something_we_can_use_in_cython
is going to look something like _ordinal_values
, at least for methods like quantile that are ordering-based
Great thanks @dsaxton |
idx = pd.Index(["x", "y"], name="a") | ||
true_quantiles = [0.5] | ||
|
||
expected = pd.Series(true_quantiles * 2, index=idx, name="b") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff