Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Lucene Byte Sized Vector #971

Conversation

naveentatikonda
Copy link
Member

Description

This PR contains changes which adds indexing, querying and docValues support to lucene byte sized vector and corresponding tests to validate it. It helps users to index vectors as byte sized vectors(which theoretically saves 75% of memory when compared to float vectors) by setting the optional data_type field as byte while creating the index. As we are not adding support for Quantization Techniques, users are expected to index vectors that are in the byte range [-128 to 127] without any decimal values. Also, right now this feature is only supported for lucene engine.

Issues Resolved

#812

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…#937)

* Add Indexing Support for Lucene Byte Sized Vector

Signed-off-by: Naveen Tatikonda <[email protected]>

* Add tests for Indexing

Signed-off-by: Naveen Tatikonda <[email protected]>

* Add CHANGELOG

Signed-off-by: Naveen Tatikonda <[email protected]>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <[email protected]>

---------

Signed-off-by: Naveen Tatikonda <[email protected]>
)

* Add Querying Support to Lucene Byte Sized Vector

Signed-off-by: Naveen Tatikonda <[email protected]>

* Add CHANGELOG

Signed-off-by: Naveen Tatikonda <[email protected]>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <[email protected]>

---------

Signed-off-by: Naveen Tatikonda <[email protected]>
Signed-off-by: Naveen Tatikonda <[email protected]>
@naveentatikonda naveentatikonda force-pushed the feature/lucene_byte_vector branch from 380dc47 to 2d04aba Compare July 12, 2023 00:46
@codecov
Copy link

codecov bot commented Jul 12, 2023

Codecov Report

Merging #971 (2d04aba) into main (d55eccb) will decrease coverage by 0.20%.
The diff coverage is 86.66%.

@@             Coverage Diff              @@
##               main     #971      +/-   ##
============================================
- Coverage     85.33%   85.13%   -0.20%     
- Complexity     1117     1141      +24     
============================================
  Files           152      154       +2     
  Lines          4520     4683     +163     
  Branches        406      423      +17     
============================================
+ Hits           3857     3987     +130     
- Misses          480      504      +24     
- Partials        183      192       +9     
Impacted Files Coverage Δ
...opensearch/knn/index/mapper/LuceneFieldMapper.java 71.15% <64.00%> (-3.27%) ⬇️
...ain/java/org/opensearch/knn/index/VectorField.java 71.42% <66.66%> (-3.58%) ⬇️
...nsearch/knn/index/mapper/KNNVectorFieldMapper.java 82.93% <79.41%> (-0.88%) ⬇️
...rg/opensearch/knn/index/query/KNNQueryFactory.java 88.13% <85.00%> (-2.35%) ⬇️
...rch/knn/index/mapper/KNNVectorFieldMapperUtil.java 90.47% <90.47%> (ø)
...n/java/org/opensearch/knn/common/KNNConstants.java 93.75% <100.00%> (+0.41%) ⬆️
...opensearch/knn/index/KNNVectorDVLeafFieldData.java 75.00% <100.00%> (+2.27%) ⬆️
.../opensearch/knn/index/KNNVectorIndexFieldData.java 100.00% <100.00%> (ø)
...opensearch/knn/index/KNNVectorScriptDocValues.java 84.61% <100.00%> (-5.87%) ⬇️
.../java/org/opensearch/knn/index/VectorDataType.java 100.00% <100.00%> (ø)
... and 5 more

... and 2 files with indirect coverage changes

Copy link
Member

@vamshin vamshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks

@naveentatikonda naveentatikonda merged commit bf04854 into opensearch-project:main Jul 12, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-971-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 bf04854c483fdfd38663bdc490e1730c994bda6d
# Push it to GitHub
git push --set-upstream origin backport/backport-971-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-971-to-2.x.

@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.9 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.9 2.9
# Navigate to the new working tree
cd .worktrees/backport-2.9
# Create a new branch
git switch --create backport/backport-971-to-2.9
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 bf04854c483fdfd38663bdc490e1730c994bda6d
# Push it to GitHub
git push --set-upstream origin backport/backport-971-to-2.9
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.9

Then, create a pull request where the base branch is 2.9 and the compare/head branch is backport/backport-971-to-2.9.

naveentatikonda added a commit that referenced this pull request Jul 12, 2023
* Add Indexing Support for Lucene Byte Sized Vector (#937)

* Add Indexing Support for Lucene Byte Sized Vector

Signed-off-by: Naveen Tatikonda <[email protected]>

* Add tests for Indexing

Signed-off-by: Naveen Tatikonda <[email protected]>

* Add CHANGELOG

Signed-off-by: Naveen Tatikonda <[email protected]>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <[email protected]>

---------

Signed-off-by: Naveen Tatikonda <[email protected]>

* Add Querying Support to Lucene Byte Sized Vector (#956)

* Add Querying Support to Lucene Byte Sized Vector

Signed-off-by: Naveen Tatikonda <[email protected]>

* Add CHANGELOG

Signed-off-by: Naveen Tatikonda <[email protected]>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <[email protected]>

---------

Signed-off-by: Naveen Tatikonda <[email protected]>

* Add DocValues Support for Lucene Byte Sized Vector (#953)

Signed-off-by: Naveen Tatikonda <[email protected]>

* Update Release Notes

Signed-off-by: Naveen Tatikonda <[email protected]>

---------

Signed-off-by: Naveen Tatikonda <[email protected]>
(cherry picked from commit bf04854)
naveentatikonda added a commit that referenced this pull request Jul 12, 2023
* Add Indexing Support for Lucene Byte Sized Vector (#937)

* Add Indexing Support for Lucene Byte Sized Vector

Signed-off-by: Naveen Tatikonda <[email protected]>

* Add tests for Indexing

Signed-off-by: Naveen Tatikonda <[email protected]>

* Add CHANGELOG

Signed-off-by: Naveen Tatikonda <[email protected]>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <[email protected]>

---------

Signed-off-by: Naveen Tatikonda <[email protected]>

* Add Querying Support to Lucene Byte Sized Vector (#956)

* Add Querying Support to Lucene Byte Sized Vector

Signed-off-by: Naveen Tatikonda <[email protected]>

* Add CHANGELOG

Signed-off-by: Naveen Tatikonda <[email protected]>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <[email protected]>

---------

Signed-off-by: Naveen Tatikonda <[email protected]>

* Add DocValues Support for Lucene Byte Sized Vector (#953)

Signed-off-by: Naveen Tatikonda <[email protected]>

* Update Release Notes

Signed-off-by: Naveen Tatikonda <[email protected]>

---------

Signed-off-by: Naveen Tatikonda <[email protected]>
(cherry picked from commit bf04854)
naveentatikonda added a commit that referenced this pull request Jul 12, 2023
* Add Indexing Support for Lucene Byte Sized Vector (#937)

* Add Indexing Support for Lucene Byte Sized Vector

Signed-off-by: Naveen Tatikonda <[email protected]>

* Add tests for Indexing

Signed-off-by: Naveen Tatikonda <[email protected]>

* Add CHANGELOG

Signed-off-by: Naveen Tatikonda <[email protected]>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <[email protected]>

---------

Signed-off-by: Naveen Tatikonda <[email protected]>

* Add Querying Support to Lucene Byte Sized Vector (#956)

* Add Querying Support to Lucene Byte Sized Vector

Signed-off-by: Naveen Tatikonda <[email protected]>

* Add CHANGELOG

Signed-off-by: Naveen Tatikonda <[email protected]>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <[email protected]>

---------

Signed-off-by: Naveen Tatikonda <[email protected]>

* Add DocValues Support for Lucene Byte Sized Vector (#953)

Signed-off-by: Naveen Tatikonda <[email protected]>

* Update Release Notes

Signed-off-by: Naveen Tatikonda <[email protected]>

---------

Signed-off-by: Naveen Tatikonda <[email protected]>
(cherry picked from commit bf04854)
Signed-off-by: Naveen Tatikonda <[email protected]>
naveentatikonda added a commit that referenced this pull request Jul 12, 2023
* Add Indexing Support for Lucene Byte Sized Vector (#937)

* Add Indexing Support for Lucene Byte Sized Vector

Signed-off-by: Naveen Tatikonda <[email protected]>

* Add tests for Indexing

Signed-off-by: Naveen Tatikonda <[email protected]>

* Add CHANGELOG

Signed-off-by: Naveen Tatikonda <[email protected]>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <[email protected]>

---------

Signed-off-by: Naveen Tatikonda <[email protected]>

* Add Querying Support to Lucene Byte Sized Vector (#956)

* Add Querying Support to Lucene Byte Sized Vector

Signed-off-by: Naveen Tatikonda <[email protected]>

* Add CHANGELOG

Signed-off-by: Naveen Tatikonda <[email protected]>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <[email protected]>

---------

Signed-off-by: Naveen Tatikonda <[email protected]>

* Add DocValues Support for Lucene Byte Sized Vector (#953)

Signed-off-by: Naveen Tatikonda <[email protected]>

* Update Release Notes

Signed-off-by: Naveen Tatikonda <[email protected]>

---------

Signed-off-by: Naveen Tatikonda <[email protected]>
(cherry picked from commit bf04854)
Signed-off-by: Naveen Tatikonda <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x backport 2.9 Features Introduces a new unit of functionality that satisfies a requirement skip-changelog v2.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants