-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Inaccuracy in IVF-Flat Search Results with Large Number of Queries #1756
Comments
Thanks for the report. Apparently, we do test this use case in raft. I've tried to add a dozen of other parameter combinations, but consistently get the recall of 1.0; that is, I haven't been able to reproduce this in raft. So far, the only hypothesis that comes to my mind is that maybe it is a concurrency issue? Maybe when raft runs more than one batch iteration, the submitted gpu work piles up and does not finish before the results are submitted for evaluation? Do you set the cuda stream in raft::resources (raft_handle) to be the same as the stream faiss uses under the hood? Or you do synchronize between them / device? |
Thanks for pointing out the large query batch test.
This set of inputs is the same as the one being run in FAISS. |
Thanks for very helpful reproducer! The PR is ready. |
Fix the cluster probes (coarse_index) not being advanced when batching. Thanks @tarang-jain for the precise reproducer. Closes: #1756 Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1764
Describe the bug
IVF-Flat Search gives inconsistent results with different batch sizes when the number of queries is very large. This is a blocker in the FAISS IVF-Flat integration. cc @achirkin @cjnolet @tfeher
Steps/Code to reproduce bug
Context: With the faiss integration work under way, the following test from faiss fails: LargeBatch
This test runs 100,000 search queries on an IVF-Flat index and compares the resulting indices and distances with a FAISS CPU IVF-Flat Index.
Expected behavior
I tried modifying the batch size by changing this line to
const uint32_t max_queries = std::min<uint32_t>(n_queries, 10000);
and now the test passes. I tried other values such that they are lesser than 32768, to come to the conclusion that whenever the
max_queries
defined here is greater than the kMaxGridY, the results are incorrect and when it is lesser than kMaxGridY, the FAISS test passes.In other words, whenever the kernel here runs more than once, the test fails.
The text was updated successfully, but these errors were encountered: