-
Notifications
You must be signed in to change notification settings - Fork 331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix issue with elasticsearch extension to respect limit and minReleva… #935
base: main
Are you sure you want to change the base?
Conversation
…nce parameters when calling GetSimilarListAsync API
@microsoft-github-policy-service agree |
@@ -218,15 +218,20 @@ public async Task<string> UpsertAsync( | |||
Embedding embedding = await this._embeddingGenerator.GenerateEmbeddingAsync(text, cancellationToken).ConfigureAwait(false); | |||
var coll = embedding.Data.ToArray(); | |||
|
|||
//adjust min score for cosine similarity | |||
//https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html#knn-similarity-search | |||
float adjustedMinSimilarityScore = (float)(2 * minRelevance - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This formula transforms from score to cosine similarity, however internally ES works with score.
I believe this should be the inverse, e.g. float adjustedMinSimilarityScore = (float)((minRelevance + 1) / 2)
By the way, the formula (2 * minRelevance - 1)
should be used in line 253, replacing hit.Score
with the actual cosine similarity. The value returned is incorrect and doesn't match the similarity returned by other memory connectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html#knn-similarity-search
default similarity metric is cosine and I didn't see something defines another similarity metric in the plugin code.
From my tests I found this score adjustment for cosine returns correct results.
please let me know if I'm missing something here or if you have another proposed tests to include..
A similarity value. This value determines the similarity metric used to score documents based on similarity between the query and document vector. For a list of available metrics, see the [similarity](https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html#dense-vector-similarity) parameter documentation. **The similarity setting defaults to cosine**.
var resp = await this._client.SearchAsync<ElasticsearchMemoryRecord>(s => | ||
s.Index(index) | ||
s.Index(index).Size(limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qd.k(limit)
already sets a limit, isn't this redundant? should we remove the limit
param in qd.k(limit)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kNN search adds k matching documents to the search. So, if you set k=20, then it finds 20 matches. Then size takes the top scoring documents from these and returns them.
From my tests in real dataset. it was always returning 10 regardless of the k(limit) value. I think this is the default value of size parameter
extensions/Elasticsearch/Elasticsearch.FunctionalTests/Additional/KernelMemoryTests.cs
Outdated
Show resolved
Hide resolved
extensions/Elasticsearch/Elasticsearch.FunctionalTests/Additional/KernelMemoryTests.cs
Outdated
Show resolved
Hide resolved
extensions/Elasticsearch/Elasticsearch.FunctionalTests/Additional/TestsHelper.cs
Outdated
Show resolved
Hide resolved
extensions/Elasticsearch/Elasticsearch.FunctionalTests/Additional/KernelMemoryTests.cs
Outdated
Show resolved
Hide resolved
extensions/Elasticsearch/Elasticsearch.FunctionalTests/Additional/KernelMemoryTests.cs
Outdated
Show resolved
Hide resolved
…nal/KernelMemoryTests.cs
…nal/KernelMemoryTests.cs
…nal/TestsHelper.cs
…nce parameters when calling GetSimilarListAsync API
Motivation and Context (Why the change? What's the scenario?)
Fixing issue #934
High level description (Approach, Design)
using ElasticSearch Client .Size(limit) clause during query.
using .Similarity(min score) to fix minRelevance (after adjusting the score value based on cosine similarity as per https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html#knn-similarity-search)