-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor scatter
for list columns
#8255
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-21.08 #8255 +/- ##
===============================================
Coverage ? 82.83%
===============================================
Files ? 109
Lines ? 17901
Branches ? 0
===============================================
Hits ? 14828
Misses ? 3073
Partials ? 0 Continue to review full report at Codecov.
|
Here are some initial benchmark results. The two control variables are Benchmark code is uploaded.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CMake code LGTM
When I try to build this I keep getting errors about
am I missing something? |
@revans2 that results from a bad rename attempt. Sorry for the noise! Updating. |
#include <cudf/detail/get_value.cuh> | ||
#include <cudf/detail/valid_if.cuh> | ||
#include <cudf/lists/detail/copying.hpp> | ||
#include <cudf/lists/detail/scatter_helper.cuh> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you put the header in lists/detail
but the cu file in lists/copying
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem like there's a separation between detail/non-detail files in src
files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact unless the header is needed by more than one module, it can go in the src directory rather than in lists/detail.
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor nitpicks. But this looks good to me.
(Sorry, it took a while to review.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Good job, sir.
@gpucibot merge |
This PR refactors
scatter
forLIST
type columns. Previously there were nestedfor_each_n
when constructing child columns. The outer loop loops over the rows and the inner loops over the elements of each row. We can replace these loops with a singletransform
because we already have the offsets information of the column to construct.For each element, we first lookup the
unbound_list_view
it belongs to via binary searching the offset vector. Then the corresponding element to copy from can be retrieved by dereferencing boundedlist_view
with the proper intra index.Struct type refactor is different. Currently the implementation wraps every child in a lists column and dispatch to the list type specialization. This is fine, but the wrapping process now deep copies the list offsets and child column for dispatching. We can simplify it by just wrapping it with a view.
Since
scatter.cuh
is included in many other files, separating scatter implementation detail can help reducing compilation time during refactoring the code. Most helper function is moved intoscatter_helper.cu
.Benchmarking code for scattering lists is added. Benchmark snapshot is below: