-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Multi-Vector support for HNSW search #13525
Open
vigyasharma
wants to merge
135
commits into
apache:main
Choose a base branch
from
vigyasharma:tensors_pr
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 47 commits
Commits
Show all changes
135 commits
Select commit
Hold shift + click to select a range
96e09b3
tensor partial
vigyasharma 50f3789
docstring edit
vigyasharma 0050b30
define a KnnFloatTensorField
vigyasharma b543668
clean up tensor similarity function
vigyasharma c58a3fb
add to fieldIndo and indexing chain schema setup
vigyasharma b237cdb
started creating new tensor format
vigyasharma 4324151
modified Lucene99FlatVectorsWriter to write tensors
vigyasharma 774e91d
Tensor writer with changes to write tensorDataOffsets
vigyasharma d15cffc
write byte tensor values
vigyasharma ada3db1
read tensor metadata and create FieldEntry
vigyasharma e42a389
Added OffHeapFloatTensorValues
vigyasharma 9bc98bc
support for OffHeapByteTensorValues with code to reuse the bytebuffer
vigyasharma 5e11640
default flat tensor scorer impl
vigyasharma 851a14b
Tensor reader impl. done
vigyasharma 7c39b76
Flat tensor writer impl.
vigyasharma 318455a
revert all tensor changes in FlatVectorsWriter
vigyasharma b24ae9a
remove unused imports
vigyasharma a35062f
modify hnswVectorsWriter to build graph with tensors
vigyasharma c94127d
reuse hnswvectorsreader for tensors
vigyasharma ced69a3
add knnTensorsformat with an hnsw impl
vigyasharma b20859f
plumb tensors into the indexing chain
vigyasharma a30e06e
update lucene94FieldInfosFormat constructor
vigyasharma 13bd4c5
update FieldInfos ctor calls and bw codecs with new params
vigyasharma 834db2b
get tensorReader from codec readers and segment readers
vigyasharma d214b1b
minor bug fix with readers
vigyasharma 4adefed
add KnnFloatTensorQuery wrapper
vigyasharma b31e0be
add ByteTensorQuery
vigyasharma 208b041
gradle tidy
vigyasharma 4d173e7
remove commented code
vigyasharma b5551fc
tidy again
vigyasharma 421b6d6
fix ctor bug
vigyasharma ee2e0d7
gradle check minus test files
vigyasharma f50e91a
add no commit tags
vigyasharma edde79a
make Tensor Similarity Fn a class with Agg enum
vigyasharma a2a91c4
changed tensor fields, fieldTypes and fieldInfos
vigyasharma 0f7b1dd
modify FlatTensor reader and writers with new FI format
vigyasharma 825e267
modify HnswVectors reader and writer for new FieldInfo
vigyasharma 42475cc
update codecreader and tensor query classes
vigyasharma 0316e02
fix FieldInfo in some files
vigyasharma 0ccb40b
update FieldInfo ctor
vigyasharma ebc2518
add hashcode to TSF
vigyasharma 21261af
gradle tidy
vigyasharma 872874c
remove nocommits for builds
vigyasharma d52868f
linting, licenses, and compilation bugs
vigyasharma 749a670
tidy
vigyasharma ece1728
add missing java docs
vigyasharma 30bdb49
add no commits
vigyasharma b92d6eb
start single format multivectors change
vigyasharma 1579aea
rename TensorSimilarity to MultiVectorSimilarity
vigyasharma 4f5e494
Add interface to FlatVectorsScorer; del FlatTensorsScorer
vigyasharma f667e4d
Use DefaultFlatMultiVectorScorer that extends DefFlatVecScorer
vigyasharma 6f13b2e
change OffHeapMVValues; change MVDataOffsetReader
vigyasharma 429a106
change field values to Byte/FloatMultiVectorValue
vigyasharma 104d802
changes to FlatVectorsWriter; del FlatTensorsWriter
vigyasharma 5afa0d9
update FlatVectorsReader to handle multi-vectors
vigyasharma da32aae
minor import bug
vigyasharma 3fb3808
del FlatTensorReader and FlatTensorFormat
vigyasharma d0a0b65
Hnsw reader, writer and format changes
vigyasharma 001afd7
remove multivector metadata from hnsw format; not needed
vigyasharma ca41e85
remove codec change; use static default var for agg fn
vigyasharma db97c0d
change FieldInfo values; remove TensorQuery classes
vigyasharma 7769e48
restore readers to main branch versions
vigyasharma 1274a27
reword some occurrences of tensor in comments
vigyasharma 05174ff
indexing change and vectorValConsumer changes
vigyasharma cd08506
some comment changes; use DEF_AGG for MultiVecSimFnAgg
vigyasharma 79450dd
tidy
vigyasharma cd7e890
compile errors from FI ctor
vigyasharma 0edb868
use ByteBuffer.clear instead of reset
vigyasharma 19fe230
add logs for assert failures
vigyasharma bca27ff
missed a ;
vigyasharma 93c5077
fix bug around multi-vector condition
vigyasharma bc0ebc1
bug fix for non-multi-vector case
vigyasharma 1d80b28
tidy
vigyasharma a33d533
move changes to new format files
vigyasharma 8a098f2
tidy; remove unused imports
vigyasharma c591b67
remove multi-vector specific check
vigyasharma 51ca4dc
missing docstring
vigyasharma b5ada6f
tensor partial
vigyasharma bc68c8f
docstring edit
vigyasharma 785c958
define a KnnFloatTensorField
vigyasharma d7197be
clean up tensor similarity function
vigyasharma 9c6d4f5
add to fieldIndo and indexing chain schema setup
vigyasharma 9ce6977
started creating new tensor format
vigyasharma 462e150
modified Lucene99FlatVectorsWriter to write tensors
vigyasharma b2f3fad
Tensor writer with changes to write tensorDataOffsets
vigyasharma a88c4ba
write byte tensor values
vigyasharma 8a3ec4f
read tensor metadata and create FieldEntry
vigyasharma 9d6ffd5
Added OffHeapFloatTensorValues
vigyasharma cf4ca9c
support for OffHeapByteTensorValues with code to reuse the bytebuffer
vigyasharma a9a345b
default flat tensor scorer impl
vigyasharma 555b055
Tensor reader impl. done
vigyasharma 0abc881
Flat tensor writer impl.
vigyasharma eebf38e
revert all tensor changes in FlatVectorsWriter
vigyasharma 5334fdb
remove unused imports
vigyasharma bc02b0b
modify hnswVectorsWriter to build graph with tensors
vigyasharma 1a9ed21
reuse hnswvectorsreader for tensors
vigyasharma f567095
add knnTensorsformat with an hnsw impl
vigyasharma 059a8ff
plumb tensors into the indexing chain
vigyasharma 7b86353
update lucene94FieldInfosFormat constructor
vigyasharma 1626195
update FieldInfos ctor calls and bw codecs with new params
vigyasharma 27449f5
get tensorReader from codec readers and segment readers
vigyasharma 34d4063
minor bug fix with readers
vigyasharma 543565f
add KnnFloatTensorQuery wrapper
vigyasharma 9a0633b
add ByteTensorQuery
vigyasharma 90c42b2
gradle tidy
vigyasharma 8c3b1c0
remove commented code
vigyasharma 533c2bf
tidy again
vigyasharma 9b7cb0b
fix ctor bug
vigyasharma c2ae83b
gradle check minus test files
vigyasharma 0ecda0f
add no commit tags
vigyasharma 24d231e
make Tensor Similarity Fn a class with Agg enum
vigyasharma d08bc68
changed tensor fields, fieldTypes and fieldInfos
vigyasharma 419cc9e
modify FlatTensor reader and writers with new FI format
vigyasharma 75ecb2c
modify HnswVectors reader and writer for new FieldInfo
vigyasharma e6e6bdf
update codecreader and tensor query classes
vigyasharma c0912d1
fix FieldInfo in some files
vigyasharma c24a99f
update FieldInfo ctor
vigyasharma 04636b9
add hashcode to TSF
vigyasharma 4b2bad4
gradle tidy
vigyasharma a98035c
remove nocommits for builds
vigyasharma 56db6d0
linting, licenses, and compilation bugs
vigyasharma 97d327d
tidy
vigyasharma acdd158
add missing java docs
vigyasharma c80cbcf
add no commits
vigyasharma 6ca9e14
merge new changes
vigyasharma 3365889
fix all merge conflicts
vigyasharma 12f5b54
no commits
vigyasharma ffda63f
remove isMultiVector member in Lucene99FlatMultiVectorsWriter.FieldWr…
cpoerschke b2b95cd
remove FieldInfo.isMultiVector in favour of MultiVectorSimilarityFunc…
cpoerschke eacc63c
in IndexingChain.FieldSchema fold setMultiVectors into setVectors
cpoerschke 1570690
remove metadata check for fixed vector lengths. add debug logs
vigyasharma b1e5568
fix hnsw vector to read MV scorer supplier; fix loop and array copy i…
vigyasharma ce2face
add Aggregate to FieldInfos format
vigyasharma 2e52495
assert and logs on vector-data rw offsets
vigyasharma 7aa9555
go to base vectorscorer for single valued use-case
vigyasharma File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will help with churn, etc. and we prefer not having
*
imports.