-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring column logic Part 1 #8081
Refactoring column logic Part 1 #8081
Conversation
…hould be generic into ColumnBase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Codecov Report
@@ Coverage Diff @@
## branch-0.20 #8081 +/- ##
===============================================
- Coverage 82.88% 82.87% -0.02%
===============================================
Files 103 103
Lines 17668 17835 +167
===============================================
+ Hits 14645 14781 +136
- Misses 3023 3054 +31
Continue to review full report at Codecov.
|
@gpucibot merge |
Follow up to #8081. This PR 1) inlines the binary op functions in different columns (which I presume were mainly created for profiling with nvtx), 2) moves all categorical column concatenation logic into the CategoricalColumn class, which allows the deletion of certain unimplemented functions in ColumnBase that only existed to satisfy mypy, and 3) performs some cleanup in the form of removing unused properties and using the `cached_property` decorator where appropriate. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Keith Kraus (https://github.com/kkraus14) - Ashwin Srinath (https://github.com/shwina) - Michael Wang (https://github.com/isVoid) URL: #8130
This PR is a first pass at refactoring
ColumnBase
and its subclasses to reduce redundancy and improve performance by avoiding runtime type checking. Many functions are implemented in the top-level class but dispatch on dtype, which can instead be accomplished via ducktyping. Additionally, other parts ofcudf
require various methods to be implemented by a column, butColumnBase
does not currently clearly delineate an interface, making it difficult to know what to rely on in classes likeFrame
and pushing dynamic type dispatch upstream in the call stack where it is even less efficient and causes substantial code duplication. This PR moves specialized implementations of certain methods into the appropriate subclasses ofColumnBase
and establishes a base API in the parent class where appropriate. Since this change will be large, I plan to split it into a few different PRs. This PR primarily modifiesto_pandas
,to_arrow
, and__cuda_array_interface__
, along with a few other minor improvements.