Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Results with Cosine LSH KNN Algorithm on Large Indices #746

Open
1 task done
akhil-bot opened this issue Nov 28, 2024 · 1 comment
Open
1 task done
Labels
bug Something isn't working

Comments

@akhil-bot
Copy link

akhil-bot commented Nov 28, 2024

Support guidelines

Background

I am running a 3-node Elasticsearch cluster on AWS servers. Each index contains between 100,000 to 1 million documents, with potential for further growth.

Bug

I am encountering an issue where identical queries return different sets of documents on repeated executions, leading to inconsistent results for end-users. This inconsistency is negatively impacting the user experience.

Configuration:
I am currently using Cosine LSH for dense vector search with the following mapping:

"chunkVector_1024": {
  "type": "elastiknn_dense_float_vector",
  "elastiknn": {
    "model": "lsh",
    "similarity": "angular",
    "dims": 1024,
    "L": 99,
    "k": 1
  }
}

Query:

{
  "elastiknn_nearest_neighbors": {
    "field": "chunkVector_1024",
    "vec": {"values": {{vector_values}}},
    "model": "lsh",
    "similarity": "angular",
    "candidates": 100
  }
}

Observed Behavior:
The results for the same query vary with each attempt, making the responses unpredictable.

Investigation and Benchmarking:
Switching to the exact kNN approach (as documented in the ElastiKNN plugin) resolves the inconsistency but results in increased latency—approximately double that of the Cosine LSH method.

Latency Comparison:
The below benchmarks are ran on a index with 1 shard, 1 replica, contains ~15k documents

Query Avg Response Time (Cosine LSH) Avg Response Time (Exact kNN)
What is mutual fund? 10.97 ms 20.38 ms
How can I invest in NPS? 10.29 ms 18.70 ms
Advantages of mutual funds? 8.24 ms 19.58 ms
How to open savings account? 10.27 ms 19.41 ms
What are debt funds? 10.96 ms 18.22 ms

Request for Recommendations:
Given the large indices and the need for low latency, how can I optimize the Cosine LSH setup to ensure consistent results while maintaining performance? Are there any adjustments or alternative configurations you would recommend? I would be happy to provide more details if needed.
@alexklibisz

Elastiknn Version

7.17.7

Platform

AWS servers

Steps to reproduce

No response

Additional info

No response

@akhil-bot akhil-bot added the bug Something isn't working label Nov 28, 2024
@akhil-bot akhil-bot changed the title Inconsistent Results with Cosine LSH in Elasticsearch kNN Plugin on Large Indices Inconsistent Results with Cosine LSH KNN Algorithm on Large Indices Nov 28, 2024
@alexklibisz
Copy link
Owner

7.17.7

This is a very old version. I recommend looking through the commit history to see if anything related to this has been fixed: 7.17.7.0...8.15.3.0

I also recommend just trying with the latest version. If we're able to fix this, it's going to come out on the next version. I'm not going to backport it to a version that's > 2 years old.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants