Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define proper binary operation APIs for columns #10509

Merged
merged 22 commits into from
Mar 25, 2022

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Mar 24, 2022

This PR changes the way that binary operations are performed between columns. Instead of directly invoking the _binaryop method Frame binary operations now invoke operators directly using the operator module. Each Column subclass now only defines operations that are well-defined, relying on Python to handle raising TypeErrors for all others. Binary operations return NotImplemented instead of raising a TypeError except in specific cases where a meaningful error should be raised, allowing us to take advantage of reflected operations to prevent duplicate logic on how to handle binary operations between distinct types. Finally, various edge cases that were previously handled by Frames are now handled in Column so that different dtype columns are the sole source of truth on what operands are supported. These changes move us towards fully functional Column classes that do not rely on preprocessed inputs coming from the Frame layer.

This PR has a large changeset, but a large chunk of the changes lines are simply because some changes to the pipeline result in operations having their dunder names instead of having the dunders stripped, e.g. __add__ instead of add.

@vyasr vyasr added 3 - Ready for Review Ready for review by team Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 24, 2022
@vyasr vyasr added this to the CuDF Python Refactoring milestone Mar 24, 2022
@vyasr vyasr self-assigned this Mar 24, 2022
@vyasr vyasr requested a review from a team as a code owner March 24, 2022 19:18
@vyasr vyasr requested review from bdice and rgsl888prabhu March 24, 2022 19:18
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments attached. Overall this looks like a good overhaul of the binary op dispatching logic.

python/cudf/cudf/_lib/datetime.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/core/column/column.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/column/datetime.py Show resolved Hide resolved
python/cudf/cudf/core/column/datetime.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/column/datetime.py Outdated Show resolved Hide resolved
"__ge__",
"NULL_EQUALS",
}:
out_dtype = self._binary_op_lt_gt_le_ge_eq_ne(other)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest a different function name here. It doesn't include NULL_EQUALS and is pretty long / confusing.

Suggested change
out_dtype = self._binary_op_lt_gt_le_ge_eq_ne(other)
out_dtype = self._binary_op_comparison(other)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to push this to a future PR as well. I'm hoping to improve the per-column binary operation logic in a number of places, but I'm trying to restrict this PR to just sorting out the public API so that we can decouple future changes to the Frame layer from changes to the Column layer.

python/cudf/cudf/core/frame.py Show resolved Hide resolved
python/cudf/cudf/core/mixins/binops.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/mixins/binops.py Show resolved Hide resolved
python/cudf/cudf/utils/utils.py Outdated Show resolved Hide resolved
@vyasr vyasr requested a review from bdice March 24, 2022 23:46
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks fine. Thanks for your replies to comments. I'll let you click "Resolve" on the remaining conversations, or leave them open if it helps you track ideas for follow-up work.

@codecov
Copy link

codecov bot commented Mar 25, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.06@8c7260f). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 5bd9691 differs from pull request most recent head 298572d. Consider uploading reports for the commit 298572d to get more accurate results

@@               Coverage Diff               @@
##             branch-22.06   #10509   +/-   ##
===============================================
  Coverage                ?   86.34%           
===============================================
  Files                   ?      140           
  Lines                   ?    22298           
  Branches                ?        0           
===============================================
  Hits                    ?    19253           
  Misses                  ?     3045           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8c7260f...298572d. Read the comment docs.

@vyasr
Copy link
Contributor Author

vyasr commented Mar 25, 2022

rerun tests

@vyasr
Copy link
Contributor Author

vyasr commented Mar 25, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 19ab7d6 into rapidsai:branch-22.06 Mar 25, 2022
@vyasr vyasr deleted the refactor/binops_part2 branch March 31, 2022 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants