Skip to content
This repository has been archived by the owner on Aug 16, 2022. It is now read-only.

Added section for painless scripting #381

Closed
wants to merge 4 commits into from

Conversation

VijayanB
Copy link
Contributor

Define specialized api that are allowed to be used in scripting.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@VijayanB
Copy link
Contributor Author

This will be merged as part of odfe 1.13.0.
Corresponding feature PR: opendistro-for-elasticsearch/k-NN#281

## Use Similarity methods in painless scripting

Sometimes users would like to go beyond Elasticsearch’s built-in features for scoring and might want to customize the search scores in more complex ways.
Elasticsearch provides script_score, the ability to provide custom scores for returned documents.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elasticsearch provides script_score, the ability to provide custom scores for desired documents. ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

}

```
Since l2Squared function is a distance function, unlike cosine function, we need to reverse the output.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we add this line instead

"The lesser the distance the more the relevance of the document to the query vector. In order to bring the lesser distances documents to the top of the scores, we invert the distance from l2Squared function"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link

@vamshin vamshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@VijayanB
Copy link
Contributor Author

VijayanB commented Jan 4, 2021

@ashwinkumar12345 @aetter Can you please review? This feature will be released as part of odfe 1.13.0. Thanks.

@aetter aetter added the upcoming release Don't merge until the version or feature is available label Jan 4, 2021
Copy link
Contributor

@aetter aetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks really good, but could use some light cleanup.

@@ -246,3 +246,98 @@ All parameters are required.
The standard KNN query and custom scoring option perform differently. Test using a representative set of documents to see if the search results and latencies match your expectations.

Custom scoring works best if the initial filter reduces the number of documents to no more than 20,000. Increasing shard count can improve latencies, but be sure to keep shard size within [the recommended guidelines](../elasticsearch/#primary-and-replica-shards).


## Use Similarity methods in painless scripting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use similarity functions in Painless scripts

From the Painless docs, it looks like they are functions rather than methods. We just want to choose the right term and be consistent.

docs/knn/index.md Outdated Show resolved Hide resolved
docs/knn/index.md Outdated Show resolved Hide resolved

### Cosine Similarity
This function calculates the measure of cosine similarity between a given query vector and document vectors.
Optionally accepts normQueryVector, to avoid repeated calculation of normalization for query vector for every filtered documents.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optionally accepts normQueryVector to avoid...

Can you be more specific about why you'd add this parameter and maybe some sample values/outcomes?

}
}
```
The above script adds 1.0 to the cosine similarity to keep score positive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here. What's the range of values, and why is the 1.0 necessary? An example or two would likely help.

docs/knn/index.md Outdated Show resolved Hide resolved
Also, when a document vector matches the query vector, we needed to add 1 in the denominator to avoid divide by zero error.

####Constraints
1. If a document’s knn vector field has different dimensions from the query, an error(IllegalArgumentException) will be thrown.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If a document’s knn_vector field has different dimensions than the query, the function throws an IllegalArgumentException.

  2. If a vector field doesn't have a value, the function throws an IllegalStateException.

    You can avoid this situation by first checking if a document as a value for the field:

    "source": "doc[params.field].size() == 0 ? 0 : 1 / (1 + l2Squared(params.query_value, doc[params.field]))",
    

    Since scores can only be positive, this script ranks documents with vector fields higher than those without.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Define specialized api that are allowed to be used in scripting.
Add resolution for constraint to avoid error.
@VijayanB
Copy link
Contributor Author

@aetter can you take another look at this? Thanks.

Copy link
Contributor

@aetter aetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still a few minor formatting concerns, but we'll just tweak it post-merge. LGTM.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
upcoming release Don't merge until the version or feature is available
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants