-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support script score when doc value is disabled #1573
Support script score when doc value is disabled #1573
Conversation
Currently, if we disable the doc value and use script score, the following exception will be thrown:
|
This is a great idea. There is a github issue created for the same. There some challenges around the same. Ref: #1087 (comment) |
@bugmakerrrrrr can you fix the gh actions. |
Yes we should do this if possible. @jmazanec15 what your thought? |
src/main/java/org/opensearch/knn/index/KNNVectorDVLeafFieldData.java
Outdated
Show resolved
Hide resolved
I did a very high level look on the PR, the logic seems to be pretty neat. But please remove all sysout statements and add docs on public functions and new interfaces which are created. Along with fix the gh actions. |
Right, #1087 calls this out. I think it makes sense to implement our own KnnVectorsFormat that uses the correct FlatVectorsFormat. Edit: I meant at some point, not necessarily right now. |
Yes, I agree this can be disabled - no sense in storing both. |
This looks pretty good to me. I think we will need to add a few test cases. |
On this I would think we should do some BWC to see if for an index we suddenly stop sending doc values will it cause any issue. @bugmakerrrrrr can we add a BWC tests for the same. |
@navneet1v Of course. But in this PR, I will focus on supporting script score, and open another PR to disable doc value. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1573 +/- ##
============================================
+ Coverage 84.96% 84.97% +0.01%
- Complexity 1366 1374 +8
============================================
Files 172 172
Lines 5566 5599 +33
Branches 546 553 +7
============================================
+ Hits 4729 4758 +29
- Misses 605 607 +2
- Partials 232 234 +2 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall code looks good to me.
src/main/java/org/opensearch/knn/index/KNNVectorDVLeafFieldData.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/knn/plugin/script/KNNScriptScoringIT.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: panguixin <[email protected]>
Signed-off-by: panguixin <[email protected]>
Signed-off-by: panguixin <[email protected]>
543c456
to
b110b5f
Compare
Signed-off-by: panguixin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks for fixing the tests.
@jmazanec15 need approval from your side too on the PR. |
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1573-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 771c4b54a74b7c4406c71a8bf758378329cfe4d5
# Push it to GitHub
git push --set-upstream origin backport/backport-1573-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x Then, create a pull request where the |
@bugmakerrrrrr can you manually raise the backport PR for this change. |
@navneet1v Sure |
) * support script score when doc value is disabled Signed-off-by: panguixin <[email protected]> * add test Signed-off-by: panguixin <[email protected]> * apply review comments Signed-off-by: panguixin <[email protected]> * fix test Signed-off-by: panguixin <[email protected]> --------- Signed-off-by: panguixin <[email protected]> (cherry picked from commit 771c4b5)
) * support script score when doc value is disabled Signed-off-by: panguixin <[email protected]> * add test Signed-off-by: panguixin <[email protected]> * apply review comments Signed-off-by: panguixin <[email protected]> * fix test Signed-off-by: panguixin <[email protected]> --------- Signed-off-by: panguixin <[email protected]> (cherry picked from commit 771c4b5)
* support script score when doc value is disabled Signed-off-by: panguixin <[email protected]> * add test Signed-off-by: panguixin <[email protected]> * apply review comments Signed-off-by: panguixin <[email protected]> * fix test Signed-off-by: panguixin <[email protected]> --------- Signed-off-by: panguixin <[email protected]> (cherry picked from commit 771c4b5)
) (opensearch-project#1587) * support script score when doc value is disabled Signed-off-by: panguixin <[email protected]> * add test Signed-off-by: panguixin <[email protected]> * apply review comments Signed-off-by: panguixin <[email protected]> * fix test Signed-off-by: panguixin <[email protected]> --------- Signed-off-by: panguixin <[email protected]> (cherry picked from commit 771c4b5)
Description
Today, script score is not supported if using lucene engine and doc value is disabled, because k-NN plugin uses doc value to execute exact query with script score. But actually, we can use
ByteVectorValues/FloatVectorValues
to execute this type of query.In addition to , I think that we can totally disable doc value when using lucene engine, vector values can replace the function of doc value. Further, we can use
FlatVectorsFormat
to store vector values and implement our own native index functionality, so that we can avoid extra de/serialization and unify the script doc value interface. Any thoughts?Btw, This PR has not added relevant tests yet.
Issues Resolved
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.