You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running a 3-node Elasticsearch cluster on AWS servers. Each index contains between 100,000 to 1 million documents, with potential for further growth.
Bug
I am encountering an issue where identical queries return different sets of documents on repeated executions, leading to inconsistent results for end-users. This inconsistency is negatively impacting the user experience.
Configuration:
I am currently using Cosine LSH for dense vector search with the following mapping:
Observed Behavior:
The results for the same query vary with each attempt, making the responses unpredictable.
Investigation and Benchmarking:
Switching to the exact kNN approach (as documented in the ElastiKNN plugin) resolves the inconsistency but results in increased latency—approximately double that of the Cosine LSH method.
Latency Comparison:
The below benchmarks are ran on a index with 1 shard, 1 replica, contains ~15k documents
Query
Avg Response Time (Cosine LSH)
Avg Response Time (Exact kNN)
What is mutual fund?
10.97 ms
20.38 ms
How can I invest in NPS?
10.29 ms
18.70 ms
Advantages of mutual funds?
8.24 ms
19.58 ms
How to open savings account?
10.27 ms
19.41 ms
What are debt funds?
10.96 ms
18.22 ms
Request for Recommendations:
Given the large indices and the need for low latency, how can I optimize the Cosine LSH setup to ensure consistent results while maintaining performance? Are there any adjustments or alternative configurations you would recommend? I would be happy to provide more details if needed. @alexklibisz
Elastiknn Version
7.17.7
Platform
AWS servers
Steps to reproduce
No response
Additional info
No response
The text was updated successfully, but these errors were encountered:
akhil-bot
changed the title
Inconsistent Results with Cosine LSH in Elasticsearch kNN Plugin on Large Indices
Inconsistent Results with Cosine LSH KNN Algorithm on Large Indices
Nov 28, 2024
This is a very old version. I recommend looking through the commit history to see if anything related to this has been fixed: 7.17.7.0...8.15.3.0
I also recommend just trying with the latest version. If we're able to fix this, it's going to come out on the next version. I'm not going to backport it to a version that's > 2 years old.
Support guidelines
Background
I am running a 3-node Elasticsearch cluster on AWS servers. Each index contains between 100,000 to 1 million documents, with potential for further growth.
Bug
I am encountering an issue where identical queries return different sets of documents on repeated executions, leading to inconsistent results for end-users. This inconsistency is negatively impacting the user experience.
Configuration:
I am currently using Cosine LSH for dense vector search with the following mapping:
Query:
Observed Behavior:
The results for the same query vary with each attempt, making the responses unpredictable.
Investigation and Benchmarking:
Switching to the exact kNN approach (as documented in the ElastiKNN plugin) resolves the inconsistency but results in increased latency—approximately double that of the Cosine LSH method.
Latency Comparison:
The below benchmarks are ran on a index with 1 shard, 1 replica, contains ~15k documents
Request for Recommendations:
Given the large indices and the need for low latency, how can I optimize the Cosine LSH setup to ensure consistent results while maintaining performance? Are there any adjustments or alternative configurations you would recommend? I would be happy to provide more details if needed.
@alexklibisz
Elastiknn Version
7.17.7
Platform
AWS servers
Steps to reproduce
No response
Additional info
No response
The text was updated successfully, but these errors were encountered: