BUG: Fix SeriesGroupBy.quantile for nullable integers #33138

dsaxton · 2020-03-30T00:46:21Z

closes SeriesGroupBy.quantile doesn't work for nullable integers #33136
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

WillAyd · 2020-03-30T03:48:05Z

pandas/core/groupby/groupby.py

@@ -2241,6 +2242,10 @@ def _get_cythonized_result(
        for idx, obj in enumerate(self._iterate_slices()):
            name = obj.name
            values = obj._values
+            if is_extension_array_dtype(values.dtype) and is_integer_dtype(


I think there is a more generic way to handle this? @jbrockmendel

Yeah, agree this seems like an overly specific check that probably isn't catching other bugs. I tried just casting any ExtensionArray but got bunches of test failures (I think they had to do with datetimes).

hmm this definitely seems not-great. i think to do this right we need some EA method that gives the ndarray to use when we get here, and maybe a way to round-trip the results if the method is dtype-preserving.

Am I right in thinking that dt64tz (and maybe period?) get cast to i8 somewhere around here?

Looking at this again I think it might make sense to update the pre_processor function: https://github.com/pandas-dev/pandas/blob/master/pandas/core/groupby/groupby.py#L1857 . Does that look right to you @jbrockmendel ?

that seems plausible. im hoping there's a way to do this with values_for_argsort that can be shared by EAs, and by other order-based methods

Would that handle NAs in a way that would still give the right quantiles?

pandas/tests/groupby/test_apply.py

jreback

so if you quantile BooleanDtype is another test case

jreback · 2020-04-06T23:59:50Z

pandas/core/groupby/groupby.py

@@ -1861,9 +1863,13 @@ def pre_processor(vals: np.ndarray) -> Tuple[np.ndarray, Optional[Type]]:
                )

            inference = None
-            if is_integer_dtype(vals):
+            if is_integer_dtype(vals.dtype):


this is fine for here, but I think we need a generic 'convert_this_to_something_we_can_use_in_cython)` method on EA. @jbrockmendel @jorisvandenbossche @TomAugspurger

As discussed here, i think the EA.convert_this_to_something_we_can_use_in_cython is going to look something like _ordinal_values, at least for methods like quantile that are ordering-based

WillAyd · 2020-04-07T23:10:19Z

Great thanks @dsaxton

jbrockmendel · 2023-02-28T17:47:32Z

pandas/tests/groupby/test_function.py

+        idx = pd.Index(["x", "y"], name="a")
+        true_quantiles = [0.5]
+
+    expected = pd.Series(true_quantiles * 2, index=idx, name="b")


tracking down #51424 it looks like support for nullable-bool was added here, but it seems weird that we'd return Float64 for that. only discussion is see is a comment from @jreback saying it would need a separate test. is this intentional?

dsaxton added 7 commits March 29, 2020 18:51

Convert to numpy

0496516

Test

5d7a4a5

Note

0e1a1a9

Format

409dbf4

Hard code

c2f12c9

Lint

065ce08

Nit

4a8ed58

WillAyd reviewed Mar 30, 2020

View reviewed changes

WillAyd added ExtensionArray Extending pandas with custom dtypes or arrays. Groupby labels Mar 30, 2020

simonjayhawkins reviewed Mar 30, 2020

View reviewed changes

pandas/tests/groupby/test_apply.py Outdated Show resolved Hide resolved

simonjayhawkins added the Bug label Mar 30, 2020

simonjayhawkins added this to the 1.1 milestone Mar 30, 2020

dsaxton added 2 commits March 30, 2020 12:09

Merge remote-tracking branch 'upstream/master' into integer-quantile

fd103dc

Move test

4a88dab

jreback requested changes Mar 30, 2020

View reviewed changes

dsaxton added 2 commits March 30, 2020 19:27

Move check

fcc00cb

Update test

bb69e3d

jreback reviewed Apr 6, 2020

View reviewed changes

jreback approved these changes Apr 7, 2020

View reviewed changes

jreback and others added 2 commits April 6, 2020 20:01

Merge branch 'master' into integer-quantile

5364e68

Merge remote-tracking branch 'upstream/master' into integer-quantile

8ef2789

WillAyd approved these changes Apr 7, 2020

View reviewed changes

WillAyd merged commit 8267427 into pandas-dev:master Apr 7, 2020

dsaxton deleted the integer-quantile branch April 7, 2020 23:13

jbrockmendel reviewed Feb 28, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Fix SeriesGroupBy.quantile for nullable integers #33138

BUG: Fix SeriesGroupBy.quantile for nullable integers #33138

dsaxton commented Mar 30, 2020 •

edited

Loading

WillAyd Mar 30, 2020

dsaxton Mar 30, 2020

jbrockmendel Mar 30, 2020

dsaxton Mar 30, 2020

jbrockmendel Mar 30, 2020

dsaxton Mar 31, 2020

jreback left a comment

jreback Apr 6, 2020

jbrockmendel Apr 7, 2020

WillAyd commented Apr 7, 2020

jbrockmendel Feb 28, 2023

BUG: Fix SeriesGroupBy.quantile for nullable integers #33138

BUG: Fix SeriesGroupBy.quantile for nullable integers #33138

Conversation

dsaxton commented Mar 30, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd commented Apr 7, 2020

Choose a reason for hiding this comment

dsaxton commented Mar 30, 2020 •

edited

Loading