-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Sporadic empty inner hits on nested kNN search #466
Comments
After playing around with this problem more, it seems like all the documents retrieved without inner hits for a single query are groups of consecutively indexed documents. No idea why. For example:
(I set the ID to correspond to the order the document is indexed) |
Hi @BM25-enthusiast, taking a look at this |
Does this belong in https://github.com/opensearch-project/k-NN? |
Thank you! let me know if I can help! |
@BM25-enthusiast tried to write some code to reproduce it locally: https://gist.github.com/jmazanec15/c7bbcc4ecd3a17f58d18b33737e769bf. I got a different error:
^ Seems like this might also be an issue Im going to investigate. Would you be able to provide your repro script as well so that I can compare? |
Hi @jmazanec15, Here is the code I used: https://gist.github.com/BM25-enthusiast/56b00d98926942db56f8ad31830e81c8 Let me know if there's anything else I can help with |
@BM25-enthusiast Issue I was running into in #466 (comment) seems to be an issue with a lucene component, BitSetConjunctionDISI. I created an issue here: https://issues.apache.org/jira/browse/LUCENE-10674. When I ran your script, I got the same error as above. Im going to try to build a local fix for lucene and validate if it resolves the issue. |
Interesting. Wondering why I didn't get that issue. What OpenSearch version and OS are you using @jmazanec15 ? I didn't get this issue on MacOS and OpenSearch 2.1.0 (docker) I also ran similar scripts in AWS Sagemaker without encountering this issue |
I was just checking this repo out at main and running After running it with #502 fix, I dont get the error any more with my script and also with yours I think it works (I ran it a couple times):
I think #502 indirectly fixes it. I think the underlying issue is in Lucene. I created a fix for it in Lucene here: apache/lucene#1068. Its still in review. Im going to run a few more tests to see if it works correctly when there are multiple nested objects. |
I modified my script to add multiple nested objects. I think #502 will fix most cases it. Here is the script.
I did notice that when there are more nested objects, the old version is less likely to fail. |
Describe the bug
Hi there,
There seems to be a bug within nested approximate kNN search. The problem is that, occasionally, documents will be returned without any inner hits. How is this possible?
To Reproduce
index over 10k documents.
Perform a nested approximate-kNN query. You may have to perform a lot of queries with different vectors before this issue is encountered (500+ even):
This is an example search result, where the issue is encountered. Notice the two last hits have empty inner hits:
This is more prevalent with datasets larger than 10000 docs. It seems like this issue is more prevalent:
I have a small script which generates lots of random documents, indexes them and then searches. Happy to share it if it would help solve the issue.
Expected behavior
No returned hits should have empty inner hits.
Host/Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: