-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ArrayManager] Remaining GroupBy tests (fix count, pass on libreduction for now) #40050
[ArrayManager] Remaining GroupBy tests (fix count, pass on libreduction for now) #40050
Conversation
@@ -214,6 +215,10 @@ def apply(self, f: F, data: FrameOrSeries, axis: int = 0): | |||
# TODO: can we have a workaround for EAs backed by ndarray? | |||
pass | |||
|
|||
elif isinstance(sdata._mgr, ArrayManager): | |||
# TODO(ArrayManager) don't use fast_apply / libreduction.apply_frame_axis0 | |||
# for now -> relies on BlockManager internals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im totally fine with skipping for the time being. medium-term, i think it could use .arrays instead of .blocks, might be easy-ish compat
# GH 36308 | ||
if using_array_manager and transformation_func == "pct_change": | ||
# TODO(ArrayManager) column-wise shift | ||
pytest.skip("ArrayManager: column-wise not yet implemented") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xfail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, changed
good reminder. ive got a terminal window open i need to revisit... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, ex @jbrockmendel comments and one of mine
@@ -270,15 +270,30 @@ def grouped_reduce(self: T, func: Callable, ignore_failures: bool = False) -> T: | |||
------- | |||
ArrayManager | |||
""" | |||
# TODO ignore_failures | |||
result_arrays = [func(arr) for arr in self.arrays] | |||
result_arrays: List[np.ndarray] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks a whole lot like reduce right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's quite similar in logic (it are also both reduce operations, so not that unsurprising), but IMO they are different enough that trying to share anything will only make it more complex (return value is different, they need to process the result inside the loop differently, etc)
It might be possible to change the return value of reduce
to make this easier, but that's a bigger change, so if we want that, it's for a separate PR
thanks @jorisvandenbossche |
Follow-up on #39885 and #40047
This passes / skips all groupby tests (in
/tests/groupby
). Actual code changes:groupby().count()
to not create a ArrayManager with 2D arraysfast_apply
/libreduction.apply_frame_axis0
for DataFrames with ArrayManager. This can be tackled separately, while for now already getting more test coverage by using the python fallback (as we eg also do for EAs).ignore_failures
forArrayManager.grouped_reduction
(this keyword is actually always True at the moment, so we could also leave away the keyword itself)Most of the skips are related
quantile
(ordescribe
) not yet being implemented. And it also uncovered a few other failures, that can be worked on as follow-ups.