Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add access to dense_vector values #71313
Add access to dense_vector values #71313
Changes from 3 commits
0857f54
8d2e83a
d9dfdb1
388b1f8
51f509b
6d7bd66
d8edd03
a66f755
503fdbc
32a4165
c0fa8a9
8a6d6b8
116cf96
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we implement a slow version of
getVectorMagnitude
for vectors before 7.5? This seems easy and would make the API simpler. (Then we also might be able to useDenseVectorScriptDocValues#getVectorMagnitude
to remove some logic insidecosineSimilarity
that requires direct access to the index version!)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtibshirani Thanks for the feedback. I was also thinking the same – to implement a slower version of
getVectorMagnitude
, but this would require decoding the whole vector. If a user is already usingvectorValue
in their script and decoding a vector, usingmagnitude
would mean decoding this vector the second time. So it would be faster for a user to implementmagnitude
function themselves since they would already have the decoded vector available. WDYT?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it's the right trade-off for a simple API. First, it will only be slow for vectors indexed before 7.5, which was before they were even GA. It also seems okay that it's slow, users can easily work around it. Maybe we could just write a short note in the docs about the pre-7.5 behavior so that users are aware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test coverage looks good! Since unit tests are generally easier to work with than REST tests, I wondered if there was a way perform some of the same checks as unit tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtibshirani Thanks for the feedback. I could not find any good examples of doing unit tests with script doc values. Unit tests that we have they mock scripts contexts and mock what script returns, which kind of defeats the purpose of testing what
getVectorValue()
andgetMagnitude()
returns.I am happy to redesign tests as unit tests, if you know any examples I can follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like there are some simple cases where we just want to check
getVectorValue
andgetMagnitude
return the right value or error appropriately. These could be covered in a test likeDenseVectorScriptDocValuesTests
. A similar test would beScriptDocValuesGeoPointsTests
.