-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CAGRA pad dataset for 128bit vectorized load #1505
Conversation
Todo:
|
Marked as breaking change because the PR
|
There is still a bug with the new tests, otherwise it is ready for review. |
The solution of the compilation error is: 231: dataset_size,
+ dataset_ld,
232: result_buffer_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me.
One comment:
It would be better to support a padded dataset in the argument of cagra::sort_knn_graph
. What do you think?
I have fixed a bug in the serialization routine, and added more tests. @enp1s0 could you have a quick look again at the changes?
I think it is a good suggestion, and it might be also useful to allow the constructor of index to accept padded dataset (i.e. strided mdspan). I would prefer to make these changes in a follow up PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @tfeher, for fixing it and adding tests. The code looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a couple questions and a suggestion for a future improvement overall. Since CAGRA is still experimental, we have some leeway for API changes.
cudaMemcpyDefault, | ||
resource::get_cuda_stream(res))); | ||
resource::sync_stream(res); | ||
serialize_mdspan(res, os, host_dataset.view()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly a side question, but are we still planning to remove the dataset from the serialization in a future change?
resource::get_cuda_stream(res)); | ||
} else { | ||
// copy with padding | ||
RAFT_CUDA_TRY(cudaMemsetAsync( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is going to be a more common practice, I wonder if we should consider centralizing this somewhere eventually. Probably doesn't need to be done yet, or even in this PR, though.
@@ -569,7 +546,7 @@ struct search : public search_plan_impl<DATA_T, INDEX_T, DISTANCE_T> { | |||
~search() {} | |||
|
|||
void operator()(raft::resources const& res, | |||
raft::device_matrix_view<const DATA_T, INDEX_T, row_major> dataset, | |||
raft::device_matrix_view<const DATA_T, INDEX_T, layout_stride> dataset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be any benefit to using a padded layout here or having an overload for it in the public API just to simplify the conversion?
/merge |
This PR adds padding to the dataset (if necessary) to make reading any of its rows compatible with 128bit vectorized loads. This change also enables handling arbitrary number of input features (before this PR each row had to be at least 64bit aligned, which constrained the acceptable number of input features).
Fixes #1458.
With this change, it is sufficient to keep a single "load type" specialization for the search kernels, which shall cut the binary size by half (#1459).