Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ArrayManager] Remaining GroupBy tests (fix count, pass on libreduction for now) #40050

Merged
merged 3 commits into from
Feb 27, 2021

Conversation

jorisvandenbossche
Copy link
Member

Follow-up on #39885 and #40047

This passes / skips all groupby tests (in /tests/groupby). Actual code changes:

  • fix groupby().count() to not create a ArrayManager with 2D arrays
  • for now skip the usage of fast_apply / libreduction.apply_frame_axis0 for DataFrames with ArrayManager. This can be tackled separately, while for now already getting more test coverage by using the python fallback (as we eg also do for EAs).
  • implement ignore_failures for ArrayManager.grouped_reduction (this keyword is actually always True at the moment, so we could also leave away the keyword itself)

Most of the skips are related quantile (or describe) not yet being implemented. And it also uncovered a few other failures, that can be worked on as follow-ups.

@jorisvandenbossche jorisvandenbossche added Groupby Internals Related to non-user accessible pandas implementation labels Feb 25, 2021
@jorisvandenbossche jorisvandenbossche added this to the 1.3 milestone Feb 25, 2021
@@ -214,6 +215,10 @@ def apply(self, f: F, data: FrameOrSeries, axis: int = 0):
# TODO: can we have a workaround for EAs backed by ndarray?
pass

elif isinstance(sdata._mgr, ArrayManager):
# TODO(ArrayManager) don't use fast_apply / libreduction.apply_frame_axis0
# for now -> relies on BlockManager internals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im totally fine with skipping for the time being. medium-term, i think it could use .arrays instead of .blocks, might be easy-ish compat

# GH 36308
if using_array_manager and transformation_func == "pct_change":
# TODO(ArrayManager) column-wise shift
pytest.skip("ArrayManager: column-wise not yet implemented")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xfail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, changed

@jbrockmendel
Copy link
Member

quantile

good reminder. ive got a terminal window open i need to revisit...

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, ex @jbrockmendel comments and one of mine

@@ -270,15 +270,30 @@ def grouped_reduce(self: T, func: Callable, ignore_failures: bool = False) -> T:
-------
ArrayManager
"""
# TODO ignore_failures
result_arrays = [func(arr) for arr in self.arrays]
result_arrays: List[np.ndarray] = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks a whole lot like reduce right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's quite similar in logic (it are also both reduce operations, so not that unsurprising), but IMO they are different enough that trying to share anything will only make it more complex (return value is different, they need to process the result inside the loop differently, etc)

It might be possible to change the return value of reduce to make this easier, but that's a bigger change, so if we want that, it's for a separate PR

@jreback jreback merged commit c3eca7e into pandas-dev:master Feb 27, 2021
@jreback
Copy link
Contributor

jreback commented Feb 27, 2021

thanks @jorisvandenbossche

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants