-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ANN refinement method #1038
Conversation
rerun tests |
1 similar comment
rerun tests |
rerun tests |
2 similar comments
rerun tests |
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good and easy to read to me!
I have some doubts about trying to adapt the ivf-flat though. Since ivf-flat is wired to accept the interleaved data, we end up reading the vectors twice, which does not sound very efficient. Could a naive segmented distance kernel followed by a select_topk invocation be faster and even less code? Perhaps, we could explore this in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CMake changes LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Artem for the review, I have updated the PR!
Since ivf-flat is wired to accept the interleaved data, we end up reading the vectors twice, which does not sound very efficient.
That is right. Actually the index building is the time consuming part because it need to gather the neighbor vectors based on their indices, so we do not have a good access pattern. Streaming over the already gathered data much faster.
Could a naive segmented distance kernel followed by a select_topk invocation be faster and even less code?
Yes, we shall explore that in a follow up work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates, LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
raft::neighbors::ivf_flat::index<data_t, idx_t> refinement_index( | ||
handle, metric, n_queries, false, dim); | ||
|
||
raft::spatial::knn::ivf_flat::detail::fill_refinement_index(handle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a little note: I'm hoping to remove the raft::spatial::knn namespace altogether in 23.02 and officially move all the detail headers over to raft::neighbors. Hopefully we can get all the deprecated files removed as well.
@gpucibot merge |
This PR implements refinement for approximate nearest neighbor search.
Refinement is a post processing step for ANN search, it follows an ANN search that returned
k0
neighbor candidates,and select
k
out of these candidates. The selection by calculating exact distances from the original dataset.Refinement can increase accuracy. It is useful for ANN methods that quantize the dataset and therefore loose accuracy during distance calculation (e.g. IVF-PQ).