Skip to content

Commit

Permalink
Update the signature of vector script functions. (#48653)
Browse files Browse the repository at this point in the history
Previously the functions accepted a doc values reference, whereas they now
accept the name of the vector field. Here's an example of how a vector function
was called before and after the change.

```
Before: cosineSimilarity(params.query_vector, doc['field'])
After:  cosineSimilarity(params.query_vector, 'field')
```

This seems more intuitive, since we don't allow direct access to vector doc
values and the the meaning of `doc['field']` is unclear.

The PR makes the following changes (broken into distinct commits):
* Add new function signatures of the form `function(params.query_vector,
'field')` and deprecates the old ones. Because Painless doesn't allow two
methods with the same name and number of arguments, we allow a generic `Object`
to be passed in to the function and decide on the behavior through an
`instanceof` check.
* Refactor the class bindings so that the document field is passed to the
constructor instead of the instance method. This allows us to avoid retrieving
the vector doc values on every function invocation, which gives a tiny speed-up
in benchmarks.

Note that this PR adds new signatures for the sparse vector functions too, even
though sparse vectors are deprecated. It seemed simplest to understand (for both
us and users) to keep everything symmetric between dense and sparse vectors.
  • Loading branch information
jtibshirani authored Oct 29, 2019
1 parent 25724c5 commit 89c6575
Show file tree
Hide file tree
Showing 13 changed files with 376 additions and 143 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ The following specialized API is available in the Score context.
==== Static Methods
The following methods are directly callable without a class/instance qualifier. Note parameters denoted by a (*) are treated as read-only values.

* double cosineSimilarity(List *, VectorScriptDocValues.DenseVectorScriptDocValues)
* double cosineSimilaritySparse(Map *, VectorScriptDocValues.SparseVectorScriptDocValues)
* double cosineSimilarity(List *, String)
* double cosineSimilaritySparse(Map *, String)
* double decayDateExp(String *, String *, String *, double *, JodaCompatibleZonedDateTime)
* double decayDateGauss(String *, String *, String *, double *, JodaCompatibleZonedDateTime)
* double decayDateLinear(String *, String *, String *, double *, JodaCompatibleZonedDateTime)
Expand All @@ -21,8 +21,8 @@ The following methods are directly callable without a class/instance qualifier.
* double decayNumericExp(double *, double *, double *, double *, double)
* double decayNumericGauss(double *, double *, double *, double *, double)
* double decayNumericLinear(double *, double *, double *, double *, double)
* double dotProduct(List, VectorScriptDocValues.DenseVectorScriptDocValues)
* double dotProductSparse(Map *, VectorScriptDocValues.SparseVectorScriptDocValues)
* double dotProduct(List, String)
* double dotProductSparse(Map *, String)
* double randomScore(int *)
* double randomScore(int *, String *)
* double saturation(double, double)
Expand Down
7 changes: 7 additions & 0 deletions docs/reference/migration/migrate_7_6.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,10 @@ We have not seen much interest in this experimental field type, and don't see
a clear use case as it's currently designed. If you have feedback or
suggestions around sparse vector functionality, please let us know through
GitHub or the 'discuss' forums.

[discrete]
==== Update to vector function signatures
The vector functions of the form `function(query, doc['field'])` are
deprecated, and the form `function(query, 'field')` should be used instead.
For example, `cosineSimilarity(query, doc['field'])` is replaced by
`cosineSimilarity(query, 'field')`.
18 changes: 9 additions & 9 deletions docs/reference/vectors/vector-functions.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ GET my_index/_search
}
},
"script": {
"source": "cosineSimilarity(params.query_vector, doc['my_dense_vector']) + 1.0", <2>
"source": "cosineSimilarity(params.query_vector, 'my_dense_vector') + 1.0", <2>
"params": {
"query_vector": [4, 3.4, -0.2] <3>
}
Expand Down Expand Up @@ -105,7 +105,7 @@ GET my_index/_search
},
"script": {
"source": """
double value = dotProduct(params.query_vector, doc['my_dense_vector']);
double value = dotProduct(params.query_vector, 'my_dense_vector');
return sigmoid(1, Math.E, -value); <1>
""",
"params": {
Expand Down Expand Up @@ -139,7 +139,7 @@ GET my_index/_search
}
},
"script": {
"source": "1 / (1 + l1norm(params.queryVector, doc['my_dense_vector']))", <1>
"source": "1 / (1 + l1norm(params.queryVector, 'my_dense_vector'))", <1>
"params": {
"queryVector": [4, 3.4, -0.2]
}
Expand Down Expand Up @@ -178,7 +178,7 @@ GET my_index/_search
}
},
"script": {
"source": "1 / (1 + l2norm(params.queryVector, doc['my_dense_vector']))",
"source": "1 / (1 + l2norm(params.queryVector, 'my_dense_vector'))",
"params": {
"queryVector": [4, 3.4, -0.2]
}
Expand All @@ -196,7 +196,7 @@ You can check if a document has a value for the field `my_vector` by

[source,js]
--------------------------------------------------
"source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, doc['my_vector'])"
"source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, 'my_vector')"
--------------------------------------------------
// NOTCONSOLE

Expand Down Expand Up @@ -262,7 +262,7 @@ GET my_sparse_index/_search
}
},
"script": {
"source": "cosineSimilaritySparse(params.query_vector, doc['my_sparse_vector']) + 1.0",
"source": "cosineSimilaritySparse(params.query_vector, 'my_sparse_vector') + 1.0",
"params": {
"query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
}
Expand Down Expand Up @@ -294,7 +294,7 @@ GET my_sparse_index/_search
},
"script": {
"source": """
double value = dotProductSparse(params.query_vector, doc['my_sparse_vector']);
double value = dotProductSparse(params.query_vector, 'my_sparse_vector');
return sigmoid(1, Math.E, -value);
""",
"params": {
Expand Down Expand Up @@ -327,7 +327,7 @@ GET my_sparse_index/_search
}
},
"script": {
"source": "1 / (1 + l1normSparse(params.queryVector, doc['my_sparse_vector']))",
"source": "1 / (1 + l1normSparse(params.queryVector, 'my_sparse_vector'))",
"params": {
"queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
}
Expand Down Expand Up @@ -358,7 +358,7 @@ GET my_sparse_index/_search
}
},
"script": {
"source": "1 / (1 + l2normSparse(params.queryVector, doc['my_sparse_vector']))",
"source": "1 / (1 + l2normSparse(params.queryVector, 'my_sparse_vector'))",
"params": {
"queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ public Map<String, Object> getParams() {
}

/** The doc lookup for the Lucene segment this script was created for. */
public final Map<String, ScriptDocValues<?>> getDoc() {
public Map<String, ScriptDocValues<?>> getDoc() {
return leafLookup.doc();
}

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
setup:
- skip:
features: headers
features: [headers, warnings]
version: " - 7.2.99"
reason: "dense_vector dims parameter was added from 7.3"

Expand Down Expand Up @@ -52,7 +52,7 @@ setup:
script_score:
query: {match_all: {} }
script:
source: "dotProduct(params.query_vector, doc['my_dense_vector'])"
source: "dotProduct(params.query_vector, 'my_dense_vector')"
params:
query_vector: [0.5, 111.3, -13.0, 14.8, -156.0]

Expand Down Expand Up @@ -82,7 +82,7 @@ setup:
script_score:
query: {match_all: {} }
script:
source: "cosineSimilarity(params.query_vector, doc['my_dense_vector'])"
source: "cosineSimilarity(params.query_vector, 'my_dense_vector')"
params:
query_vector: [0.5, 111.3, -13.0, 14.8, -156.0]

Expand All @@ -99,3 +99,26 @@ setup:
- match: {hits.hits.2._id: "1"}
- gte: {hits.hits.2._score: 0.78}
- lte: {hits.hits.2._score: 0.791}

---
"Deprecated function signature":
- do:
headers:
Content-Type: application/json
warnings:
- The vector functions of the form function(query, doc['field']) are deprecated, and the form function(query, 'field') should be used instead. For example, cosineSimilarity(query, doc['field']) is replaced by cosineSimilarity(query, 'field').
search:
rest_total_hits_as_int: true
body:
query:
script_score:
query: {match_all: {} }
script:
source: "cosineSimilarity(params.query_vector, doc['my_dense_vector'])"
params:
query_vector: [0.5, 111.3, -13.0, 14.8, -156.0]

- match: {hits.total: 3}
- match: {hits.hits.0._id: "3"}
- match: {hits.hits.1._id: "2"}
- match: {hits.hits.2._id: "1"}
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ setup:
script_score:
query: {match_all: {} }
script:
source: "l1norm(params.query_vector, doc['my_dense_vector'])"
source: "l1norm(params.query_vector, 'my_dense_vector')"
params:
query_vector: [0.5, 111.3, -13.0, 14.8, -156.0]

Expand Down Expand Up @@ -83,7 +83,7 @@ setup:
script_score:
query: {match_all: {} }
script:
source: "l2norm(params.query_vector, doc['my_dense_vector'])"
source: "l2norm(params.query_vector, 'my_dense_vector')"
params:
query_vector: [0.5, 111.3, -13.0, 14.8, -156.0]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ setup:
script_score:
query: {match_all: {} }
script:
source: "cosineSimilarity(params.query_vector, doc['my_dense_vector'])"
source: "cosineSimilarity(params.query_vector, 'my_dense_vector')"
params:
query_vector: [10, 10, 10]

Expand All @@ -81,7 +81,7 @@ setup:
script_score:
query: {match_all: {} }
script:
source: "cosineSimilarity(params.query_vector, doc['my_dense_vector'])"
source: "cosineSimilarity(params.query_vector, 'my_dense_vector')"
params:
query_vector: [10.0, 10.0, 10.0]

Expand Down Expand Up @@ -111,7 +111,7 @@ setup:
script_score:
query: {match_all: {} }
script:
source: "cosineSimilarity(params.query_vector, doc['my_dense_vector'])"
source: "cosineSimilarity(params.query_vector, 'my_dense_vector')"
params:
query_vector: [1, 2, 3, 4]
- match: { error.root_cause.0.type: "script_exception" }
Expand All @@ -125,7 +125,7 @@ setup:
script_score:
query: {match_all: {} }
script:
source: "dotProduct(params.query_vector, doc['my_dense_vector'])"
source: "dotProduct(params.query_vector, 'my_dense_vector')"
params:
query_vector: [1, 2, 3, 4]
- match: { error.root_cause.0.type: "script_exception" }
Expand Down Expand Up @@ -161,7 +161,7 @@ setup:
script_score:
query: {match_all: {} }
script:
source: "cosineSimilarity(params.query_vector, doc['my_dense_vector'])"
source: "cosineSimilarity(params.query_vector, 'my_dense_vector')"
params:
query_vector: [10.0, 10.0, 10.0]
- match: { error.root_cause.0.type: "script_exception" }
Expand All @@ -177,7 +177,7 @@ setup:
script_score:
query: {match_all: {} }
script:
source: "doc['my_dense_vector'].size() == 0 ? 0 : cosineSimilarity(params.query_vector, doc['my_dense_vector'])"
source: "doc['my_dense_vector'].size() == 0 ? 0 : cosineSimilarity(params.query_vector, 'my_dense_vector')"
params:
query_vector: [10.0, 10.0, 10.0]

Expand Down Expand Up @@ -208,7 +208,7 @@ setup:
script_score:
query: {match_all: {} }
script:
source: "dotProductSparse(params.query_vector, doc['my_dense_vector'])"
source: "dotProductSparse(params.query_vector, 'my_dense_vector')"
params:
query_vector: {"2": 0.5, "10" : 111.3, "3": 44}
- match: { error.root_cause.0.type: "script_exception" }
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ setup:
script_score:
query: {match_all: {} }
script:
source: "dotProductSparse(params.query_vector, doc['my_sparse_vector'])"
source: "dotProductSparse(params.query_vector, 'my_sparse_vector')"
params:
query_vector: {"2": 0.5, "10" : 111.3, "50": -13.0, "113": 14.8, "4545": -156.0}

Expand Down Expand Up @@ -87,7 +87,7 @@ setup:
script_score:
query: {match_all: {} }
script:
source: "cosineSimilaritySparse(params.query_vector, doc['my_sparse_vector'])"
source: "cosineSimilaritySparse(params.query_vector, 'my_sparse_vector')"
params:
query_vector: {"2": -0.5, "10" : 111.3, "50": -13.0, "113": 14.8, "4545": -156.0}

Expand All @@ -104,3 +104,27 @@ setup:
- match: {hits.hits.2._id: "1"}
- gte: {hits.hits.2._score: 0.78}
- lte: {hits.hits.2._score: 0.791}

---
"Deprecated function signature":
- do:
headers:
Content-Type: application/json
warnings:
- The [sparse_vector] field type is deprecated and will be removed in 8.0.
- The vector functions of the form function(query, doc['field']) are deprecated, and the form function(query, 'field') should be used instead. For example, cosineSimilarity(query, doc['field']) is replaced by cosineSimilarity(query, 'field').
search:
rest_total_hits_as_int: true
body:
query:
script_score:
query: {match_all: {} }
script:
source: "cosineSimilaritySparse(params.query_vector, doc['my_sparse_vector'])"
params:
query_vector: {"2": -0.5, "10" : 111.3, "50": -13.0, "113": 14.8, "4545": -156.0}

- match: {hits.total: 3}
- match: {hits.hits.0._id: "3"}
- match: {hits.hits.1._id: "2"}
- match: {hits.hits.2._id: "1"}
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ setup:
script_score:
query: {match_all: {} }
script:
source: "l1normSparse(params.query_vector, doc['my_sparse_vector'])"
source: "l1normSparse(params.query_vector, 'my_sparse_vector')"
params:
query_vector: {"2": 0.5, "10" : 111.3, "50": -13.0, "113": 14.8, "4545": -156.0}

Expand Down Expand Up @@ -88,7 +88,7 @@ setup:
script_score:
query: {match_all: {} }
script:
source: "l2normSparse(params.query_vector, doc['my_sparse_vector'])"
source: "l2normSparse(params.query_vector, 'my_sparse_vector')"
params:
query_vector: {"2": 0.5, "10" : 111.3, "50": -13.0, "113": 14.8, "4545": -156.0}

Expand Down
Loading

0 comments on commit 89c6575

Please sign in to comment.