Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: fix op(frame, frame2) with reindex #31679

Merged
merged 9 commits into from
Feb 19, 2020
Merged

Conversation

jbrockmendel
Copy link
Member

cc @TomAugspurger this is pretty ugly, and I'm not sure how well it will behave if either frame has MultiIndex colums.

On the plus side, it could improve perf in the many-columns-but-small-intersection case.

The ugliness might be improved by moving this check to before the _align_method_FRAME call

@TomAugspurger TomAugspurger mentioned this pull request Feb 5, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.0.2 milestone Feb 5, 2020
@jorisvandenbossche
Copy link
Member

@jbrockmendel is this still a POC? Or is this good for 1.0.2?

@jbrockmendel
Copy link
Member Author

is this still a POC? Or is this good for 1.0.2?

Borderline. Do we have a timeline for 1.0.2?

@TomAugspurger
Copy link
Contributor

Next week-ish?

@jbrockmendel
Copy link
Member Author

Next week-ish?

Sounds good. I'd like to try a second draft of this, if we're OK with this approach.

@jorisvandenbossche
Copy link
Member

The approach looks generally OK to me.
But for possible alternative approach: do you know why this started failing? What changed in the ops code that might have caused this?

self, other = _align_method_FRAME(self, other, axis, flex=True, level=level)

if isinstance(other, ABCDataFrame):
# Another DataFrame
pass_op = op if should_series_dispatch(self, other, op) else na_op
pass_op = pass_op if not is_logical else op

if isinstance(orig_other, ABCDataFrame):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be handled in .align? why is this needed at all?

@jorisvandenbossche
Copy link
Member

@jbrockmendel What's the status of this?

@TomAugspurger
Copy link
Contributor

I opened #31874 for a larger discussion on how to fix this, which addresses Jeff's comment in https://github.com/pandas-dev/pandas/pull/31679/files#r376807594. That's too large to do now though.

@jbrockmendel
Copy link
Member Author

What's the status of this?

I'm catching up on reviewing things, hope to push a second draft of this later today. I don't expect the logic to change, just the organization.

@jbrockmendel
Copy link
Member Author

Updated, separating out the new logic to dedicated functions, doing it before the _align_frame_METHOD call.

pandas/core/ops/__init__.py Show resolved Hide resolved

if fill_value is None and level is None and axis is default_axis:
# TODO: any other cases we should handle here?
cols = left.columns.intersection(right.columns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this just
set(left.columns) == set(right.columns) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yah, but for certain types of indexes (e.g. RangeIndex) intersection is optimized. Usually won't be enough to matter, but still

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if xor also optimized? I think this is equivalent to not len(left.columns ^ right.columns). My guess is that xor will tend to be slower than your check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fair point. yeah we likley just need some more asv's around this (followup)

pandas/core/ops/__init__.py Outdated Show resolved Hide resolved
@jreback jreback added Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Feb 12, 2020
@jreback
Copy link
Contributor

jreback commented Feb 12, 2020

looks good, some comments

@jorisvandenbossche jorisvandenbossche changed the title POC: fix op(frame, frame2) with reindex REGR: fix op(frame, frame2) with reindex Feb 12, 2020
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a whatsnew?

doc/source/whatsnew/v1.0.2.rst Show resolved Hide resolved
pandas/core/ops/__init__.py Outdated Show resolved Hide resolved

if fill_value is None and level is None and axis is default_axis:
# TODO: any other cases we should handle here?
cols = left.columns.intersection(right.columns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

@jreback
Copy link
Contributor

jreback commented Feb 15, 2020

@jbrockmendel couple of comments.

@jbrockmendel
Copy link
Member Author

I'm under the weather, will get to these once I'm back on my feet.

@jreback
Copy link
Contributor

jreback commented Feb 15, 2020

I'm under the weather, will get to these once I'm back on my feet.

sure np! feel better!

@jbrockmendel
Copy link
Member Author

comments addressed i think

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're good here.


if fill_value is None and level is None and axis is default_axis:
# TODO: any other cases we should handle here?
cols = left.columns.intersection(right.columns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if xor also optimized? I think this is equivalent to not len(left.columns ^ right.columns). My guess is that xor will tend to be slower than your check.

result = op(new_left, new_right)

# Do the join on the columns instead of using _align_method_FRAME
# to avoid constructing two potentially large/sparse DataFrames
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice side-effect of this bugfix.


if fill_value is None and level is None and axis is default_axis:
# TODO: any other cases we should handle here?
cols = left.columns.intersection(right.columns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fair point. yeah we likley just need some more asv's around this (followup)

@jreback jreback merged commit f4dc9f9 into pandas-dev:master Feb 19, 2020
@jreback
Copy link
Contributor

jreback commented Feb 19, 2020

thanks @jbrockmendel

@lumberbot-app
Copy link

lumberbot-app bot commented Feb 19, 2020

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
$ git checkout 1.0.x
$ git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
$ git cherry-pick -m1 f4dc9f9028a3da539126f9a8a37e7c41fc7b4b3c
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
$ git commit -am 'Backport PR #31679: REGR: fix op(frame, frame2) with reindex'
  1. Push to a named branch :
git push YOURFORK 1.0.x:auto-backport-of-pr-31679-on-1.0.x
  1. Create a PR against branch 1.0.x, I would have named this PR:

"Backport PR #31679 on branch 1.0.x"

And apply the correct labels and milestones.

Congratulation you did some good work ! Hopefully your backport PR will be tested by the continuous integration and merged soon!

If these instruction are inaccurate, feel free to suggest an improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

REGR: AssertionError when subtracting Timestamp-valued DataFrames with non-indentical column index
5 participants