-
Notifications
You must be signed in to change notification settings - Fork 102
Conversation
This will be merged as part of odfe 1.13.0. |
docs/knn/index.md
Outdated
## Use Similarity methods in painless scripting | ||
|
||
Sometimes users would like to go beyond Elasticsearch’s built-in features for scoring and might want to customize the search scores in more complex ways. | ||
Elasticsearch provides script_score, the ability to provide custom scores for returned documents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elasticsearch provides script_score, the ability to provide custom scores for desired
documents. ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
docs/knn/index.md
Outdated
} | ||
|
||
``` | ||
Since l2Squared function is a distance function, unlike cosine function, we need to reverse the output. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we add this line instead
"The lesser the distance the more the relevance of the document to the query vector. In order to bring the lesser distances documents to the top of the scores, we invert the distance from l2Squared
function"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@ashwinkumar12345 @aetter Can you please review? This feature will be released as part of odfe 1.13.0. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks really good, but could use some light cleanup.
docs/knn/index.md
Outdated
@@ -246,3 +246,98 @@ All parameters are required. | |||
The standard KNN query and custom scoring option perform differently. Test using a representative set of documents to see if the search results and latencies match your expectations. | |||
|
|||
Custom scoring works best if the initial filter reduces the number of documents to no more than 20,000. Increasing shard count can improve latencies, but be sure to keep shard size within [the recommended guidelines](../elasticsearch/#primary-and-replica-shards). | |||
|
|||
|
|||
## Use Similarity methods in painless scripting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use similarity functions in Painless scripts
From the Painless docs, it looks like they are functions rather than methods. We just want to choose the right term and be consistent.
docs/knn/index.md
Outdated
|
||
### Cosine Similarity | ||
This function calculates the measure of cosine similarity between a given query vector and document vectors. | ||
Optionally accepts normQueryVector, to avoid repeated calculation of normalization for query vector for every filtered documents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optionally accepts normQueryVector
to avoid...
Can you be more specific about why you'd add this parameter and maybe some sample values/outcomes?
docs/knn/index.md
Outdated
} | ||
} | ||
``` | ||
The above script adds 1.0 to the cosine similarity to keep score positive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto here. What's the range of values, and why is the 1.0 necessary? An example or two would likely help.
docs/knn/index.md
Outdated
Also, when a document vector matches the query vector, we needed to add 1 in the denominator to avoid divide by zero error. | ||
|
||
####Constraints | ||
1. If a document’s knn vector field has different dimensions from the query, an error(IllegalArgumentException) will be thrown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
If a document’s
knn_vector
field has different dimensions than the query, the function throws an IllegalArgumentException. -
If a vector field doesn't have a value, the function throws an IllegalStateException.
You can avoid this situation by first checking if a document as a value for the field:
"source": "doc[params.field].size() == 0 ? 0 : 1 / (1 + l2Squared(params.query_value, doc[params.field]))",
Since scores can only be positive, this script ranks documents with vector fields higher than those without.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
Define specialized api that are allowed to be used in scripting.
Add resolution for constraint to avoid error.
93014c2
to
274e0b3
Compare
274e0b3
to
56aad37
Compare
56aad37
to
3fbd358
Compare
@aetter can you take another look at this? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still a few minor formatting concerns, but we'll just tweak it post-merge. LGTM.
Define specialized api that are allowed to be used in scripting.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.