-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add access to dense_vector values #71313
Add access to dense_vector values #71313
Conversation
Allow direct access to a dense_vector' values in script through the following functions: - getVectorValue – returns a vector's value as an array of floats - getVectorMagnitude – returns a vector's magnitude Closes elastic#51964
Pinging @elastic/es-search (Team:Search) |
...lugin/vectors/src/main/java/org/elasticsearch/xpack/vectors/mapper/VectorEncoderDecoder.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a helpful addition! I left a few high-level comments to start.
One overall comment: the original issue mentions exposing an iterator, whereas we return a decoded array here. I don't have major concerns with this, but wanted to highlight the difference.
...ugin/src/yamlRestTest/resources/rest-api-spec/test/vectors/20_dense_vector_special_cases.yml
Outdated
Show resolved
Hide resolved
...lugin/vectors/src/main/java/org/elasticsearch/xpack/vectors/mapper/VectorEncoderDecoder.java
Show resolved
Hide resolved
|
||
- `float[] getVectorValue()` – returns a vector's value as an array of floats | ||
|
||
- `float getVectorMagnitude()` – returns a vector's magnitude (available for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we implement a slow version of getVectorMagnitude
for vectors before 7.5? This seems easy and would make the API simpler. (Then we also might be able to use DenseVectorScriptDocValues#getVectorMagnitude
to remove some logic inside cosineSimilarity
that requires direct access to the index version!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtibshirani Thanks for the feedback. I was also thinking the same – to implement a slower version of getVectorMagnitude
, but this would require decoding the whole vector. If a user is already using vectorValue
in their script and decoding a vector, using magnitude
would mean decoding this vector the second time. So it would be faster for a user to implement magnitude
function themselves since they would already have the decoded vector available. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it's the right trade-off for a simple API. First, it will only be slow for vectors indexed before 7.5, which was before they were even GA. It also seems okay that it's slow, users can easily work around it. Maybe we could just write a short note in the docs about the pre-7.5 behavior so that users are aware.
-decodeVectorMagnitude go back to Buffer -correct documentation to use doc access methods for vector values and magnitude instead of get functions - rename getVectorMagnitude to getMagnitude - remove unnecessary yml tests
@jtibshirani Thanks for your feedback. I've tried to address your comments in 388b1f8.
Indeed, we don't return an iterator here. Thanks for highlighting this difference. But considering that the primary use-case for vectors is to process its all elements, and I've never encountered the need for iterators, I guess this should be fine to return all vector elements at once. |
- '{}' | ||
|
||
# check getVectorValue() API | ||
- do: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test coverage looks good! Since unit tests are generally easier to work with than REST tests, I wondered if there was a way perform some of the same checks as unit tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtibshirani Thanks for the feedback. I could not find any good examples of doing unit tests with script doc values. Unit tests that we have they mock scripts contexts and mock what script returns, which kind of defeats the purpose of testing what getVectorValue()
and getMagnitude()
returns.
I am happy to redesign tests as unit tests, if you know any examples I can follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like there are some simple cases where we just want to check getVectorValue
and getMagnitude
return the right value or error appropriately. These could be covered in a test like DenseVectorScriptDocValuesTests
. A similar test would be ScriptDocValuesGeoPointsTests
.
|
||
- `float[] getVectorValue()` – returns a vector's value as an array of floats | ||
|
||
- `float getVectorMagnitude()` – returns a vector's magnitude (available for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it's the right trade-off for a simple API. First, it will only be slow for vectors indexed before 7.5, which was before they were even GA. It also seems okay that it's slow, users can easily work around it. Maybe we could just write a short note in the docs about the pre-7.5 behavior so that users are aware.
...ugin/src/yamlRestTest/resources/rest-api-spec/test/vectors/30_dense_vector_script_access.yml
Outdated
Show resolved
Hide resolved
@jtibshirani Thanks for another round of review. I've tried to address them in 51f509b |
34d961b
to
51f509b
Compare
x-pack/plugin/vectors/src/main/java/org/elasticsearch/xpack/vectors/query/ScoreScriptUtils.java
Show resolved
Hide resolved
- '{}' | ||
|
||
# check getVectorValue() API | ||
- do: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like there are some simple cases where we just want to check getVectorValue
and getMagnitude
return the right value or error appropriately. These could be covered in a test like DenseVectorScriptDocValuesTests
. A similar test would be ScriptDocValuesGeoPointsTests
.
@jtibshirani Thanks for another round of review. I've tried to address your feedback in a66f755 About I still kept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
@elasticmachine test this please |
@elasticmachine test this please |
@elasticmachine run elasticsearch-ci/bwc |
@elasticmachine run elasticsearch-ci/2 |
1 similar comment
@elasticmachine run elasticsearch-ci/2 |
@elasticmachine update branch |
Allow direct access to a dense_vector' values in script through the following functions: - getVectorValue – returns a vector's value as an array of floats - getMagnitude – returns a vector's magnitude Closes elastic#51964 Backport for elastic#71313
Allow direct access to a dense_vector' values in script
through the following functions:
Closes #51964