Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the signature of vector script functions. #48604

Merged
merged 6 commits into from
Oct 29, 2019

Conversation

jtibshirani
Copy link
Contributor

@jtibshirani jtibshirani commented Oct 28, 2019

Previously the functions accepted a doc values reference, whereas they now
accept the name of the vector field. Here's an example of how a vector function
was called before and after the change.

Before: cosineSimilarity(params.query_vector, doc['field'])
After:  cosineSimilarity(params.query_vector, 'field')

This seems more intuitive, since we don't allow direct access to vector doc
values and the the meaning of doc['field'] is unclear.

The PR makes the following changes (broken into distinct commits):

  • Add new function signatures of the form function(params.query_vector, 'field') and deprecates the old ones. Because Painless doesn't allow two
    methods with the same name and number of arguments, we allow a generic Object
    to be passed in to the function and decide on the behavior through an
    instanceof check.
  • Refactor the class bindings so that the document field is passed to the
    constructor instead of the instance method. This allows us to avoid retrieving
    the vector doc values on every function invocation, which gives a tiny speed-up
    in benchmarks.

Note that this PR adds new signatures for the sparse vector functions too, even
though sparse vectors are deprecated. It seemed simplest to understand (for both
us and users) to keep everything symmetric between dense and sparse vectors.

@jtibshirani jtibshirani added :Search/Search Search-related issues that do not fall into other categories >deprecation labels Oct 28, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@jtibshirani jtibshirani force-pushed the deprecate-vector-functions branch 2 times, most recently from 3510636 to 7a05972 Compare October 29, 2019 00:10
@jtibshirani jtibshirani force-pushed the deprecate-vector-functions branch from 6330119 to c6f3f6d Compare October 29, 2019 00:17
@jtibshirani jtibshirani marked this pull request as ready for review October 29, 2019 00:17
Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one small ask in the tests

e = expectThrows(IllegalArgumentException.class, l2norm2::l2norm);
assertThat(e.getMessage(), containsString("query vector has a different number of dimensions [2] than the document vectors [5]"));

assertWarnings(ScoreScriptUtils.DEPRECATION_MESSAGE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we split these tests into separate test functions? Each should be asserting a warning is being emitted, instead of this one check at the end that only guarantees one of them had a deprecation warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR, I made sure to assertWarnings after every function invocation. I plan to refactor this test in a follow-up PR (as it's a reasonably big change).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened a follow-up PR here: #48662

@@ -74,20 +105,21 @@ public void validateDocVector(BytesRef vector) {
throw new IllegalArgumentException("The query vector has a different number of dimensions [" +
queryVector.length + "] than the document vectors [" + vectorLength + "].");
}
return vector;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but I think we can do the vectorLength validation on lines 103-108 just for the first doc, and then set a flag validated = true, and skip this validations for further docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this something that needs to be checked for each doc? Do we gaurantee at indexing time all vector fields have the same length?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjernst yes, we guarantee at indexing time that all vector fields have the same length. Specifically, there is a mapping parameter dims, and all vector fields must match that number of dimensions.

Earlier I ran benchmarks to measure the cost of length validation, and it was very small (couldn't detect it in the results). So I'm not sure it's worth adding an extra guard + check here for performance. But if the scripting framework had a way to perform a one-time validation, it could make this code a bit cleaner.

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtibshirani Thanks Julie. I assume this PR targets 7.6. For 8.0 we want to remove the outdated versions of vector script functions and keep only functions with signatures function(org.elasticsearch.script.ScoreScript, List, String)?

@jtibshirani
Copy link
Contributor Author

I assume this PR targets 7.6. For 8.0 we want to remove the outdated versions of vector script functions and keep only functions with signatures function(org.elasticsearch.script.ScoreScript, List, String)?

That's right. The way I typically do deprecations is to merge the deprecation into master, backport to 7.x, then follow-up on master with the removal.

@jtibshirani jtibshirani merged commit f863dd1 into elastic:master Oct 29, 2019
@jtibshirani jtibshirani deleted the deprecate-vector-functions branch October 29, 2019 20:26
@jtibshirani
Copy link
Contributor Author

Thanks @rjernst and @mayya-sharipova for the reviews.

jtibshirani added a commit that referenced this pull request Oct 31, 2019
Follow up to #48604. This PR removes the deprecated vector function signatures
of the form `cosineSimilarity(query, doc['field'])`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants