-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor implementation of column setitem #9110
Refactor implementation of column setitem #9110
Conversation
Although not necessarily the main purpose of this PR, it's worth noting that it does improve performance a bit, although nothing groundbreaking (a little under 10%). Before:
After:
|
Codecov Report
@@ Coverage Diff @@
## branch-21.10 #9110 +/- ##
===============================================
Coverage ? 10.76%
===============================================
Files ? 114
Lines ? 19083
Branches ? 0
===============================================
Hits ? 2054
Misses ? 17029
Partials ? 0 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!!!
@gpucibot merge |
This small PR reworks the behavior of
ColumnBase.__setitem__
when it is provided something other than a slice as input, for instance an array. This code path requires scattering the new values into the column, which previously involved converting the Column to a Frame in order to call Frame._scatter. Since that method was only used for this one purpose, the underlying libcudf scatter implementation has been rewritten to accept and return Columns, allowing us to inline the call and also get rid of a round trip from Column to Frame and back.