Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ANN refinement method #1038

Merged
merged 5 commits into from
Nov 23, 2022
Merged

Conversation

tfeher
Copy link
Contributor

@tfeher tfeher commented Nov 22, 2022

This PR implements refinement for approximate nearest neighbor search.

Refinement is a post processing step for ANN search, it follows an ANN search that returned k0 neighbor candidates,
and select k out of these candidates. The selection by calculating exact distances from the original dataset.

Refinement can increase accuracy. It is useful for ANN methods that quantize the dataset and therefore loose accuracy during distance calculation (e.g. IVF-PQ).

@tfeher tfeher requested review from a team as code owners November 22, 2022 00:18
@tfeher tfeher requested a review from achirkin November 22, 2022 00:18
@tfeher tfeher added feature request New feature or request non-breaking Non-breaking change and removed cpp CMake labels Nov 22, 2022
@sean-frye
Copy link
Member

rerun tests

1 similar comment
@jjacobelli
Copy link
Contributor

rerun tests

@jjacobelli
Copy link
Contributor

rerun tests

2 similar comments
@jjacobelli
Copy link
Contributor

rerun tests

@jjacobelli
Copy link
Contributor

rerun tests

Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good and easy to read to me!

I have some doubts about trying to adapt the ivf-flat though. Since ivf-flat is wired to accept the interleaved data, we end up reading the vectors twice, which does not sound very efficient. Could a naive segmented distance kernel followed by a select_topk invocation be faster and even less code? Perhaps, we could explore this in a follow-up.

cpp/include/raft/neighbors/refine.cuh Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/refine.cuh Show resolved Hide resolved
cpp/include/raft/neighbors/detail/refine.cuh Outdated Show resolved Hide resolved
cpp/include/raft/spatial/knn/detail/ivf_flat_build.cuh Outdated Show resolved Hide resolved
Copy link
Contributor

@robertmaynard robertmaynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMake changes LGTM

Copy link
Contributor Author

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Artem for the review, I have updated the PR!

Since ivf-flat is wired to accept the interleaved data, we end up reading the vectors twice, which does not sound very efficient.

That is right. Actually the index building is the time consuming part because it need to gather the neighbor vectors based on their indices, so we do not have a good access pattern. Streaming over the already gathered data much faster.

Could a naive segmented distance kernel followed by a select_topk invocation be faster and even less code?

Yes, we shall explore that in a follow up work.

cpp/include/raft/neighbors/detail/refine.cuh Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/refine.cuh Outdated Show resolved Hide resolved
cpp/include/raft/spatial/knn/detail/ivf_flat_build.cuh Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/refine.cuh Show resolved Hide resolved
Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, LGTM!

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

raft::neighbors::ivf_flat::index<data_t, idx_t> refinement_index(
handle, metric, n_queries, false, dim);

raft::spatial::knn::ivf_flat::detail::fill_refinement_index(handle,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a little note: I'm hoping to remove the raft::spatial::knn namespace altogether in 23.02 and officially move all the detail headers over to raft::neighbors. Hopefully we can get all the deprecated files removed as well.

@cjnolet
Copy link
Member

cjnolet commented Nov 23, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 8d742fd into rapidsai:branch-22.12 Nov 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp feature request New feature or request non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

6 participants