-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define proper binary operation APIs for columns #10509
Define proper binary operation APIs for columns #10509
Conversation
…and enable __array_ufunc__ for numpy compatibility.
…tor API to _binaryop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments attached. Overall this looks like a good overhaul of the binary op dispatching logic.
"__ge__", | ||
"NULL_EQUALS", | ||
}: | ||
out_dtype = self._binary_op_lt_gt_le_ge_eq_ne(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest a different function name here. It doesn't include NULL_EQUALS
and is pretty long / confusing.
out_dtype = self._binary_op_lt_gt_le_ge_eq_ne(other) | |
out_dtype = self._binary_op_comparison(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to push this to a future PR as well. I'm hoping to improve the per-column binary operation logic in a number of places, but I'm trying to restrict this PR to just sorting out the public API so that we can decouple future changes to the Frame layer from changes to the Column layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks fine. Thanks for your replies to comments. I'll let you click "Resolve" on the remaining conversations, or leave them open if it helps you track ideas for follow-up work.
Codecov Report
@@ Coverage Diff @@
## branch-22.06 #10509 +/- ##
===============================================
Coverage ? 86.34%
===============================================
Files ? 140
Lines ? 22298
Branches ? 0
===============================================
Hits ? 19253
Misses ? 3045
Partials ? 0 Continue to review full report at Codecov.
|
rerun tests |
@gpucibot merge |
This PR changes the way that binary operations are performed between columns. Instead of directly invoking the
_binaryop
method Frame binary operations now invoke operators directly using theoperator
module. EachColumn
subclass now only defines operations that are well-defined, relying on Python to handle raisingTypeError
s for all others. Binary operations returnNotImplemented
instead of raising aTypeError
except in specific cases where a meaningful error should be raised, allowing us to take advantage of reflected operations to prevent duplicate logic on how to handle binary operations between distinct types. Finally, various edge cases that were previously handled by Frames are now handled in Column so that different dtype columns are the sole source of truth on what operands are supported. These changes move us towards fully functional Column classes that do not rely on preprocessed inputs coming from the Frame layer.This PR has a large changeset, but a large chunk of the changes lines are simply because some changes to the pipeline result in operations having their dunder names instead of having the dunders stripped, e.g.
__add__
instead ofadd
.