Inconsistent Results with Cosine LSH KNN Algorithm on Large Indices #746

akhil-bot · 2024-11-28T13:10:58Z

Support guidelines

I've read the support guidelines

Background

I am running a 3-node Elasticsearch cluster on AWS servers. Each index contains between 100,000 to 1 million documents, with potential for further growth.

Bug

I am encountering an issue where identical queries return different sets of documents on repeated executions, leading to inconsistent results for end-users. This inconsistency is negatively impacting the user experience.

Configuration:
I am currently using Cosine LSH for dense vector search with the following mapping:

"chunkVector_1024": {
  "type": "elastiknn_dense_float_vector",
  "elastiknn": {
    "model": "lsh",
    "similarity": "angular",
    "dims": 1024,
    "L": 99,
    "k": 1
  }
}

Query:

{
  "elastiknn_nearest_neighbors": {
    "field": "chunkVector_1024",
    "vec": {"values": {{vector_values}}},
    "model": "lsh",
    "similarity": "angular",
    "candidates": 100
  }
}

Observed Behavior:
The results for the same query vary with each attempt, making the responses unpredictable.

Investigation and Benchmarking:
Switching to the exact kNN approach (as documented in the ElastiKNN plugin) resolves the inconsistency but results in increased latency—approximately double that of the Cosine LSH method.

Latency Comparison:
The below benchmarks are ran on a index with 1 shard, 1 replica, contains ~15k documents

Query	Avg Response Time (Cosine LSH)	Avg Response Time (Exact kNN)
What is mutual fund?	10.97 ms	20.38 ms
How can I invest in NPS?	10.29 ms	18.70 ms
Advantages of mutual funds?	8.24 ms	19.58 ms
How to open savings account?	10.27 ms	19.41 ms
What are debt funds?	10.96 ms	18.22 ms

Request for Recommendations:
Given the large indices and the need for low latency, how can I optimize the Cosine LSH setup to ensure consistent results while maintaining performance? Are there any adjustments or alternative configurations you would recommend? I would be happy to provide more details if needed.
@alexklibisz

Elastiknn Version

7.17.7

Platform

AWS servers

Steps to reproduce

No response

Additional info

No response

alexklibisz · 2024-12-02T17:04:48Z

7.17.7

This is a very old version. I recommend looking through the commit history to see if anything related to this has been fixed: 7.17.7.0...8.15.3.0

I also recommend just trying with the latest version. If we're able to fix this, it's going to come out on the next version. I'm not going to backport it to a version that's > 2 years old.

akhil-bot added the bug Something isn't working label Nov 28, 2024

akhil-bot changed the title ~~Inconsistent Results with Cosine LSH in Elasticsearch kNN Plugin on Large Indices~~ Inconsistent Results with Cosine LSH KNN Algorithm on Large Indices Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent Results with Cosine LSH KNN Algorithm on Large Indices #746

Inconsistent Results with Cosine LSH KNN Algorithm on Large Indices #746

akhil-bot commented Nov 28, 2024 •

edited

Loading

alexklibisz commented Dec 2, 2024

Inconsistent Results with Cosine LSH KNN Algorithm on Large Indices #746

Inconsistent Results with Cosine LSH KNN Algorithm on Large Indices #746

Comments

akhil-bot commented Nov 28, 2024 • edited Loading

Support guidelines

Background

Bug

Elastiknn Version

Platform

Steps to reproduce

Additional info

alexklibisz commented Dec 2, 2024

akhil-bot commented Nov 28, 2024 •

edited

Loading