-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle degenerate case where all HNSW search candidates are filtered #11787
Comments
This test is really testing a pathological case ... when the vectors are all the same everything is equidistant from everything else and "nearest neighbor" ceases to really even mean anything. I'm not sure we should actually have this test other than to verify that there is no crash. Maybe I'm misunderstanding, but what it the test really asserting? |
Thanks for digging into this! I added this test to exercise the tie-breaking logic. But now I think it wasn't a good idea -- HNSW is known to exhibit very poor performance when vectors are duplicated. And this test takes it to an extreme! It's not really a scenario we support well. Maybe we could just remove this test. It wasn't critical, and I could always follow-up with a better way to test tie-breaking. |
We could keep the test if we did this: #11790 which would cause fallback to a full scan in this kind of case. It seems like a reasonable fallback to me |
Description
This test failure reproduces every time. What seems to happen is that we search with a filter that retains > 50% of documents yet we hit an unlucky condition where the graph is not fully connected and every candidate node we visit gets filtered, so we end up with 0 results. It's kind of a degenerate case that is pretty unlikely to arise in a real graph, yet it seems we ought to have some kind of fallback to exact search for this case.
./gradlew :lucene:core:test --tests "org.apache.lucene.search.TestKnnVectorQuery.testFilterWithSameScore" -Ptests.jvms=1 -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=C5E04AD69C13E006 -Ptests.gui=true -Ptests.file.encoding=ISO-8859-1
Version and environment details
No response
The text was updated successfully, but these errors were encountered: