-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce result collector for HNSW #3161
Conversation
b22c5ba
to
44d162e
Compare
Thanks for the PR. It would indeed be useful to have a struct ResultCollector {
float threshold; // result will be added if distance < threshold
virtual bool add_result(idx_t id, float dis) = 0; // returns whether the result was added and the threshold was updated.
}; It can be provided to the Index's search function for a given query (like The advantage of having an abstract class for There are already ResultCollector objects in IndexIVFPQ and IndexIVFPQFastScan. The ResultCollector in this PR is of too limited scope to be useful, since it does not enable the range search support and calling externally. |
Thanks for the comment. I get the point that it does not enable range search. However, what do you mean by saying "calling externally"? Are you saying that we should compare the distance before calling the virtual function to avoid possible performance degradation? Also, there are already two methods in Index class: search and range_search. With your proposal of supporting ResultCollector for both knn and range search, a new method covering both knn and range search should be introduced. I assume you are okay with that? |
I don't think the resultcollector could replace search and range_search functions because these take batches of queries while the resultcollector is for lower level calls and not thread-safe. |
Another option is passing ResultCollector factory through HNSWSearchParameter as this PR and make its interface more generic so that both knn search and range search can be supported transparently. |
Please hold on implementing anything. We are currently discussing how to refactor the code with more |
Good to know that discussion is going on regarding ResultHandler. Hope that new plan also include the deduplication use case. |
Introduce result collector for HNSW
The result collector is to be used by caller to implement their own logic to collect result. One of example is deduplicating result based on group id. #3087