-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REGR: fix op(frame, frame2) with reindex #31679
Conversation
@jbrockmendel is this still a POC? Or is this good for 1.0.2? |
Borderline. Do we have a timeline for 1.0.2? |
Next week-ish? |
Sounds good. I'd like to try a second draft of this, if we're OK with this approach. |
The approach looks generally OK to me. |
pandas/core/ops/__init__.py
Outdated
self, other = _align_method_FRAME(self, other, axis, flex=True, level=level) | ||
|
||
if isinstance(other, ABCDataFrame): | ||
# Another DataFrame | ||
pass_op = op if should_series_dispatch(self, other, op) else na_op | ||
pass_op = pass_op if not is_logical else op | ||
|
||
if isinstance(orig_other, ABCDataFrame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be handled in .align? why is this needed at all?
@jbrockmendel What's the status of this? |
I opened #31874 for a larger discussion on how to fix this, which addresses Jeff's comment in https://github.com/pandas-dev/pandas/pull/31679/files#r376807594. That's too large to do now though. |
I'm catching up on reviewing things, hope to push a second draft of this later today. I don't expect the logic to change, just the organization. |
Updated, separating out the new logic to dedicated functions, doing it before the _align_frame_METHOD call. |
|
||
if fill_value is None and level is None and axis is default_axis: | ||
# TODO: any other cases we should handle here? | ||
cols = left.columns.intersection(right.columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this just
set(left.columns) == set(right.columns) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yah, but for certain types of indexes (e.g. RangeIndex) intersection is optimized. Usually won't be enough to matter, but still
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if xor
also optimized? I think this is equivalent to not len(left.columns ^ right.columns)
. My guess is that xor will tend to be slower than your check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, fair point. yeah we likley just need some more asv's around this (followup)
looks good, some comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a whatsnew?
|
||
if fill_value is None and level is None and axis is default_axis: | ||
# TODO: any other cases we should handle here? | ||
cols = left.columns.intersection(right.columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
@jbrockmendel couple of comments. |
I'm under the weather, will get to these once I'm back on my feet. |
sure np! feel better! |
comments addressed i think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're good here.
|
||
if fill_value is None and level is None and axis is default_axis: | ||
# TODO: any other cases we should handle here? | ||
cols = left.columns.intersection(right.columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if xor
also optimized? I think this is equivalent to not len(left.columns ^ right.columns)
. My guess is that xor will tend to be slower than your check.
result = op(new_left, new_right) | ||
|
||
# Do the join on the columns instead of using _align_method_FRAME | ||
# to avoid constructing two potentially large/sparse DataFrames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice side-effect of this bugfix.
|
||
if fill_value is None and level is None and axis is default_axis: | ||
# TODO: any other cases we should handle here? | ||
cols = left.columns.intersection(right.columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, fair point. yeah we likley just need some more asv's around this (followup)
thanks @jbrockmendel |
Owee, I'm MrMeeseeks, Look at me. There seem to be a conflict, please backport manually. Here are approximate instructions:
And apply the correct labels and milestones. Congratulation you did some good work ! Hopefully your backport PR will be tested by the continuous integration and merged soon! If these instruction are inaccurate, feel free to suggest an improvement. |
Co-authored-by: Simon Hawkins <[email protected]>
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
cc @TomAugspurger this is pretty ugly, and I'm not sure how well it will behave if either frame has MultiIndex colums.
On the plus side, it could improve perf in the many-columns-but-small-intersection case.
The ugliness might be improved by moving this check to before the _align_method_FRAME call