[ArrayManager] Remaining GroupBy tests (fix count, pass on libreduction for now) #40050

jorisvandenbossche · 2021-02-25T16:41:01Z

Follow-up on #39885 and #40047

This passes / skips all groupby tests (in /tests/groupby). Actual code changes:

fix groupby().count() to not create a ArrayManager with 2D arrays
for now skip the usage of fast_apply / libreduction.apply_frame_axis0 for DataFrames with ArrayManager. This can be tackled separately, while for now already getting more test coverage by using the python fallback (as we eg also do for EAs).
implement ignore_failures for ArrayManager.grouped_reduction (this keyword is actually always True at the moment, so we could also leave away the keyword itself)

Most of the skips are related quantile (or describe) not yet being implemented. And it also uncovered a few other failures, that can be worked on as follow-ups.

…on for now)

jbrockmendel · 2021-02-25T17:52:26Z

pandas/core/groupby/ops.py

@@ -214,6 +215,10 @@ def apply(self, f: F, data: FrameOrSeries, axis: int = 0):
            #  TODO: can we have a workaround for EAs backed by ndarray?
            pass

+        elif isinstance(sdata._mgr, ArrayManager):
+            # TODO(ArrayManager) don't use fast_apply / libreduction.apply_frame_axis0
+            # for now -> relies on BlockManager internals


im totally fine with skipping for the time being. medium-term, i think it could use .arrays instead of .blocks, might be easy-ish compat

jbrockmendel · 2021-02-25T17:52:47Z

pandas/tests/groupby/transform/test_transform.py

    # GH 36308
+    if using_array_manager and transformation_func == "pct_change":
+        # TODO(ArrayManager) column-wise shift
+        pytest.skip("ArrayManager: column-wise not yet implemented")


Yes, changed

jbrockmendel · 2021-02-25T17:53:23Z

quantile

good reminder. ive got a terminal window open i need to revisit...

jreback

looks good, ex @jbrockmendel comments and one of mine

jreback · 2021-02-26T02:47:01Z

pandas/core/internals/array_manager.py

@@ -270,15 +270,30 @@ def grouped_reduce(self: T, func: Callable, ignore_failures: bool = False) -> T:
        -------
        ArrayManager
        """
-        # TODO ignore_failures
-        result_arrays = [func(arr) for arr in self.arrays]
+        result_arrays: List[np.ndarray] = []


this looks a whole lot like reduce right?

Yes, it's quite similar in logic (it are also both reduce operations, so not that unsurprising), but IMO they are different enough that trying to share anything will only make it more complex (return value is different, they need to process the result inside the loop differently, etc)

It might be possible to change the return value of reduce to make this easier, but that's a bigger change, so if we want that, it's for a separate PR

jreback · 2021-02-27T19:02:07Z

thanks @jorisvandenbossche

[ArrayManager] Remaining GroupBy tests (fix count, pass on libreducti…

4ffb84c

…on for now)

jorisvandenbossche added Groupby Internals Related to non-user accessible pandas implementation labels Feb 25, 2021

jorisvandenbossche added this to the 1.3 milestone Feb 25, 2021

jbrockmendel reviewed Feb 25, 2021

View reviewed changes

jreback approved these changes Feb 26, 2021

View reviewed changes

jorisvandenbossche added 2 commits February 26, 2021 08:06

use xfail

469d8a8

Merge remote-tracking branch 'upstream/master' into am-groupby-tests

0333ece

This was referenced Feb 26, 2021

[ArrayManager] DataFrame constructors #39991

Merged

[ArrayManager] TST: resample tests #40085

Merged

jreback merged commit c3eca7e into pandas-dev:master Feb 27, 2021

rhshadrach mentioned this pull request Feb 27, 2021

CLN/TST: normalize test_frame_apply #40113

Merged

1 task

jorisvandenbossche deleted the am-groupby-tests branch March 1, 2021 09:14

This was referenced Mar 2, 2021

[ArrayManager] Add libreduction frame Slider for ArrayManager #40171

Closed

Refactor - ArrayManager overview issue #39146

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ArrayManager] Remaining GroupBy tests (fix count, pass on libreduction for now) #40050

[ArrayManager] Remaining GroupBy tests (fix count, pass on libreduction for now) #40050

jorisvandenbossche commented Feb 25, 2021

jbrockmendel Feb 25, 2021

jbrockmendel Feb 25, 2021

jorisvandenbossche Feb 26, 2021

jbrockmendel commented Feb 25, 2021

jreback left a comment

jreback Feb 26, 2021

jorisvandenbossche Feb 26, 2021

jreback commented Feb 27, 2021

[ArrayManager] Remaining GroupBy tests (fix count, pass on libreduction for now) #40050

[ArrayManager] Remaining GroupBy tests (fix count, pass on libreduction for now) #40050

Conversation

jorisvandenbossche commented Feb 25, 2021

jbrockmendel Feb 25, 2021

Choose a reason for hiding this comment

jbrockmendel Feb 25, 2021

Choose a reason for hiding this comment

jorisvandenbossche Feb 26, 2021

Choose a reason for hiding this comment

jbrockmendel commented Feb 25, 2021

jreback left a comment

Choose a reason for hiding this comment

jreback Feb 26, 2021

Choose a reason for hiding this comment

jorisvandenbossche Feb 26, 2021

Choose a reason for hiding this comment

jreback commented Feb 27, 2021