-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace csr_adj_graph functions with faster equivalent #746
Replace csr_adj_graph functions with faster equivalent #746
Conversation
99df1ef
to
469cb13
Compare
Moving this functionality from cuML to RAFT as discussed in rapidsai/cuml#4803. On an A10, this feature achieves +-90% of bandwidth
Practical RW bandwidth is 492 GB/s on this machine. Unrolling the loop in the kernel ( |
I have not removed the I have deduplicated the |
The csr_adj_graph functions are a performance bottleneck in the DBSCAN implementation in cuML. They are not used anywhere else. This commit replaces the csr_adj_graph functions with the dense_bool_to_unsorted_csr function. It has the same functionality, *but* 1. It requires the input adjacency matrix to be in row-major order (rather than column-major). 2. The output column indices are not guaranteed to be in ascending order (hence unsorted).
469cb13
to
598f77b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Allard, thanks for the PR. In principle it looks good, but what if we go a step further?
Currently, the function that you are adding needs to receive the row_ind
values. It would be more self contained if it only takes the boolean matrix, and calculates row_ind
internally, and returns it together with the row_ind_ptr
Earlier these steps were separated, because the vertex degree was calculated in the epsilon neighborhood kernel. But I see that in rapidsai/cuml#4803 you have added an explicit vertex degree calculation. If that is necessary, then it would make sense to move all the steps of building the csr matrix here (vertex degree, scan, adj_to_csr). What is your opinion?
@tfeher I agree that calculating the vertex degrees internally would simplify the API. If we want to have a non-allocating API however, then it makes less sense to calculate the vertex degrees internally. The caller has to pre-allocate Therefore, the function as it stands now strikes a decent balance between usability and not repeating calculations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ahendriksen for addressing the issues! The PR looks good to me.
You are right, about the allocations. In the future, if we see a need for it, we can still introduce an alternate version of adj_to_csr
, that could do all the calculations and allocations internally. Right now, I believe we can keep the current form.
- Rename dense_bool_to_unsorted_csr to adj_to_csr - Add grid-stride loops for test case generation (both bench and test) - Remove overload In addition: - Add test case for empty input - Fix behavior in case of empty input (return early)
07bdc85
to
4acee1a
Compare
rerun tests |
@cjnolet Is this PR good to go now? If it's merged into 22.08 that would make finishing up the cuML follow up easier (rapidsai/cuml#4803). |
@gpucibot merge |
The
csr_adj_graph
functions are a performance bottleneck in the DBSCAN implementation in cuML. They are not used anywhere else.This PR replaces the
csr_adj_graph
functions with the fasterdense_bool_to_unsorted_csr
function. It has the same functionality, but