-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MOD-6738] IndexComputer #535
Conversation
computer is used to process blobs and calc distance it has a DistanceCalculator to call when calculating distance abstract index object expects IndexComputer to be passed in the ctor
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #535 +/- ##
==========================================
- Coverage 97.16% 97.02% -0.14%
==========================================
Files 94 100 +6
Lines 4862 5307 +445
==========================================
+ Hits 4724 5149 +425
- Misses 138 158 +20 ☔ View full report in Codecov by Sentry. |
call it from batch iterator batchitertor store index dim
call indexComputer instead. implement alignment in indexComputer::preprocessQuery and use to align blobs before query
introduce IndexComputerExtended that can have an array of preprocessors
implment preprocess that preprocess both query and storage blobs add addvectrorctx and hnswaddvectorctx
TieredHNSWIndex::insertVectorToHNSW is now responsible to preprocess the blob according to the hnsw preprocessor HNSW: introduce indexvector(blob, label, state) that inserts a stored vector to the graph. Tiredindex: preprocessing for query is done according to the frontend index first. Factory: HNSW: added is_normalized arg to decide if we need IP or cosine
computerExtended has flags indicating if preprocessing is for query or storage or both the flags are updated in addPreproccessor.
…reprocess queries by the backend index using wrapper API
addPreprocessor returns -1 if failed to add the preprocessor (due to lack of capcity) IndexComputerExtended preprocess* fallback to IndexComputerBasic if no preprocessors added yet added tests to tst_common and test_tiered vec_sim_index: removed using spaces::dist_func_t; factory: moved is_nirmalized to new_index istead of to abstract params to the index will be intialized with the user setting metric. added namespace MemoryUtils for definitiaons related to memory allcoations added UNUSED macro to vec_sim_common
use calcDistance instead of indexComputer->calcDistance move version to the end of the arguments list of a hnsw serialized index ctor uso auto in variable declarations when possible add assert to allocate_force_aligned that alignment is not 0
use n_preprocessors as a template argument instead of dynamic allocation on preprocessors
Instead the index holds the components seperatly: indexCalculator for distance calculations and PreprocessorsContainer to pre process user data Index ctor expects a struct of all the components needed to intialize the index currently the struct contains indexCalculator and PreprocessorsContainer
ef8e781
to
44a16c5
Compare
add preprocessors to the tiered index that preprocesses blobs if needed. this commit will be reverted because the tiered index's job is to manage the safe transfer of blobs between the frontend and backend indexes. Any operations on the data should be the responsibility of the index that stores it.
This reverts commit ee434a4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go!!
In this PR introduces two new components of the VecSimIndexAbstract: Preprocessor container and a calculator.
Preprocessor container
The
PreprocessorsContainerAbstract
API supports processing blobs for storage, for query (graph search) or both.It is also responsible to copy the original blob if it needs to be processed and wrapping it with the appropriate deleted.
PreprocessorsContainerAbstract
preprocessing includes alignment of a query blob.Multi preprocessors
MultiPreprocessorsContainer
extendsPreprocessorsContainerAbstract
by holding an array of pointers toPreprocessorInterface
objects, each responsible for a different processing step.Currently, we only have
CosinePreprocessor
, which normalizes vectors if the index type is Cosine. Calling CosinePreprocessor::preprocess with a blob pointing to the same memory address will result in one normalization call and a the returned blobs will point to the same memory.NOTE: in tiered index, we assume that the vectors are processed before being inserted into the backend index, so the frontend index will be of type
VecSimMetric_Cosine
, but internally doesn't hold a cosine preprocessor.The processed blobs have a scope lifetime and will be released automatically. It is assumed that they are copied if their lifetime needs to be extended (for storage purposes for example).
Distance Calculator
The distance calculator is defined according to the distance function signature.
It holds the distance function of the abstract index.
The distance calculation API of all Distance Calculator classes is: calc_dist(v1,v2,dim), but internally they will call the distance function according to the template signature.
Index API changes
VecSimIndexAbstract
is responsible for preprocessing a blob before performing any operation.*Wrapper
functions were removed.