Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add access to dense_vector values #71313

Merged
merged 13 commits into from
Apr 19, 2021

Conversation

mayya-sharipova
Copy link
Contributor

Allow direct access to a dense_vector' values in script
through the following functions:

  • getVectorValue – returns a vector's value as an array of floats
  • getVectorMagnitude – returns a vector's magnitude

Closes #51964

Allow direct access to a dense_vector' values in script
through the following functions:

- getVectorValue – returns a vector's value as an array of floats
- getVectorMagnitude – returns a vector's magnitude

Closes elastic#51964
@mayya-sharipova mayya-sharipova added >enhancement :Search Relevance/Ranking Scoring, rescoring, rank evaluation. labels Apr 5, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Apr 5, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a helpful addition! I left a few high-level comments to start.

One overall comment: the original issue mentions exposing an iterator, whereas we return a decoded array here. I don't have major concerns with this, but wanted to highlight the difference.


- `float[] getVectorValue()` – returns a vector's value as an array of floats

- `float getVectorMagnitude()` – returns a vector's magnitude (available for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we implement a slow version of getVectorMagnitude for vectors before 7.5? This seems easy and would make the API simpler. (Then we also might be able to use DenseVectorScriptDocValues#getVectorMagnitude to remove some logic inside cosineSimilarity that requires direct access to the index version!)

Copy link
Contributor Author

@mayya-sharipova mayya-sharipova Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtibshirani Thanks for the feedback. I was also thinking the same – to implement a slower version of getVectorMagnitude, but this would require decoding the whole vector. If a user is already using vectorValue in their script and decoding a vector, using magnitude would mean decoding this vector the second time. So it would be faster for a user to implement magnitude function themselves since they would already have the decoded vector available. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it's the right trade-off for a simple API. First, it will only be slow for vectors indexed before 7.5, which was before they were even GA. It also seems okay that it's slow, users can easily work around it. Maybe we could just write a short note in the docs about the pre-7.5 behavior so that users are aware.

docs/reference/vectors/vector-functions.asciidoc Outdated Show resolved Hide resolved
-decodeVectorMagnitude go back to Buffer
-correct documentation to use doc access methods for
	vector values and magnitude instead of get functions
- rename getVectorMagnitude to getMagnitude
- remove unnecessary yml tests
@mayya-sharipova
Copy link
Contributor Author

mayya-sharipova commented Apr 8, 2021

@jtibshirani Thanks for your feedback. I've tried to address your comments in 388b1f8.

One overall comment: the original issue mentions exposing an iterator, whereas we return a decoded array here. I don't have major concerns with this, but wanted to highlight the difference.

Indeed, we don't return an iterator here. Thanks for highlighting this difference. But considering that the primary use-case for vectors is to process its all elements, and I've never encountered the need for iterators, I guess this should be fine to return all vector elements at once.

- '{}'

# check getVectorValue() API
- do:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test coverage looks good! Since unit tests are generally easier to work with than REST tests, I wondered if there was a way perform some of the same checks as unit tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtibshirani Thanks for the feedback. I could not find any good examples of doing unit tests with script doc values. Unit tests that we have they mock scripts contexts and mock what script returns, which kind of defeats the purpose of testing what getVectorValue() and getMagnitude() returns.

I am happy to redesign tests as unit tests, if you know any examples I can follow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there are some simple cases where we just want to check getVectorValue and getMagnitude return the right value or error appropriately. These could be covered in a test like DenseVectorScriptDocValuesTests. A similar test would be ScriptDocValuesGeoPointsTests.


- `float[] getVectorValue()` – returns a vector's value as an array of floats

- `float getVectorMagnitude()` – returns a vector's magnitude (available for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it's the right trade-off for a simple API. First, it will only be slow for vectors indexed before 7.5, which was before they were even GA. It also seems okay that it's slow, users can easily work around it. Maybe we could just write a short note in the docs about the pre-7.5 behavior so that users are aware.

@mayya-sharipova
Copy link
Contributor Author

mayya-sharipova commented Apr 14, 2021

@jtibshirani Thanks for another round of review. I've tried to address them in 51f509b

- '{}'

# check getVectorValue() API
- do:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there are some simple cases where we just want to check getVectorValue and getMagnitude return the right value or error appropriately. These could be covered in a test like DenseVectorScriptDocValuesTests. A similar test would be ScriptDocValuesGeoPointsTests.

@mayya-sharipova
Copy link
Contributor Author

mayya-sharipova commented Apr 14, 2021

@jtibshirani Thanks for another round of review. I've tried to address your feedback in a66f755

About DenseVectorScriptDocValuesTests, that's a great feedback.
I've added this test class to test basic functionalities of DenseVectorScriptDocValues.

I still kept 30_dense_vector_script_access.yml for a more involved test case using an array of query vectors to test to original problem reported by a user in #70437

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@mayya-sharipova
Copy link
Contributor Author

mayya-sharipova commented Apr 15, 2021

@elasticmachine test this please

@mayya-sharipova
Copy link
Contributor Author

@elasticmachine test this please

@mayya-sharipova
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/bwc

@mayya-sharipova
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/2

1 similar comment
@mayya-sharipova
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/2

@mayya-sharipova
Copy link
Contributor Author

@elasticmachine update branch

@mayya-sharipova mayya-sharipova merged commit 853e68d into elastic:master Apr 19, 2021
@mayya-sharipova mayya-sharipova deleted the dense-vector-iterator branch April 19, 2021 12:02
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request Apr 19, 2021
Allow direct access to a dense_vector' values in script
through the following functions:

- getVectorValue – returns a vector's value as an array of floats
- getMagnitude – returns a vector's magnitude

Closes elastic#51964
Backport for elastic#71313
mayya-sharipova added a commit that referenced this pull request Apr 19, 2021
Allow direct access to a dense_vector' values in script
through the following functions:

- getVectorValue – returns a vector's value as an array of floats
- getMagnitude – returns a vector's magnitude

Closes #51964
Backport for #71313
@jtibshirani jtibshirani added :Search Relevance/Vectors Vector search and removed :Search Relevance/Ranking Scoring, rescoring, rank evaluation. labels Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose dense vector iterator in painless
4 participants