-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support KNN search for FAISS IVF indices #13258
Conversation
This pull request was exported from Phabricator. Differential Revision: D67684898 |
88c4cbd
to
09e87d1
Compare
Summary: Pull Request resolved: facebook#13258 Differential Revision: D67684898
09e87d1
to
fc2af33
Compare
Summary: Pull Request resolved: facebook#13258 Differential Revision: D67684898
This pull request was exported from Phabricator. Differential Revision: D67684898 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D67684898 |
Summary: Pull Request resolved: facebook#13258 Differential Revision: D67684898
bd905aa
to
1c9a496
Compare
Summary: Pull Request resolved: facebook#13258 Differential Revision: D67684898
This pull request was exported from Phabricator. Differential Revision: D67684898 |
Summary: Pull Request resolved: facebook#13258 Differential Revision: D67684898
1c9a496
to
10ac7a5
Compare
This pull request was exported from Phabricator. Differential Revision: D67684898 |
Summary: Pull Request resolved: facebook#13258 Differential Revision: D67684898
10ac7a5
to
a0d8031
Compare
This pull request was exported from Phabricator. Differential Revision: D67684898 |
a0d8031
to
b54c27b
Compare
Summary: The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this: 1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector. 2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`. Differential Revision: D67684898
This pull request was exported from Phabricator. Differential Revision: D67684898 |
Summary: The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this: 1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector. 2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`. Differential Revision: D67684898
b54c27b
to
9007290
Compare
This pull request was exported from Phabricator. Differential Revision: D67684898 |
Summary: The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this: 1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector. 2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`. Differential Revision: D67684898
9007290
to
574541a
Compare
This pull request was exported from Phabricator. Differential Revision: D67684898 |
Summary: The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this: 1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector. 2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`. Differential Revision: D67684898
574541a
to
660cc4c
Compare
This pull request was exported from Phabricator. Differential Revision: D67684898 |
include/rocksdb/options.h
Outdated
// `similarity_search_probes` below. | ||
// | ||
// Default: none | ||
std::optional<size_t> similarity_search_neighbors; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we want similarity search options can be in its own struct.
SecondaryIndex::NewIterator()
may take both ReadOptions
and SecondaryIndexOptions
(or better name) and these two similarity search options can be part of the new Options struct.
I'm just little more cautious as we are adding more options to ReadOptions
which will be used for a very specific use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I actually thought about that too - let me take a stab at it. btw, it would probably be sufficient for NewIterator
to take just the secondary index read options (the "regular" read options would pertain to the underlying iterator).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Updated
++pos_; | ||
} | ||
|
||
void Prev() override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
faiss::InvertedListsIterator
doesn't seem to have prev()
API. Just curious where we would use this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is KNNIterator
, which is an implementation of rocksdb::Iterator
, not faiss::InvertedListsIterator
. Prev
can be used to move back to a previous vector in the result set (see the unit test)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay. I thought the users would only access the InvertedList objects via FAISS (e.g. InvertedListScanner) and the underlying implementation uses this KNNIterator.
Reading through the unit test again helped. Thanks!
|
||
const Status status = secondary_index_it_->status(); | ||
if (!status.ok()) { | ||
throw std::runtime_error(status.ToString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why we throw for errors here (and below). Wouldn't it be sufficient if is_available()
returns false after Update()
fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to differentiate between "out of items" and "some kind of error happened", similarly to RocksDB iterators. We wouldn't want to silently ignore I/O errors for example and produce an incorrect result. This is similar to how we expect RocksDB users to check an iterator's status()
after Valid()
returns false; it is possible that the iterator got invalidated because of an error (and not because we have exhausted all the elements).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a bit more context, IteratorAdapter
's code executes in the context of FAISS, and FAISS uses exceptions to communicate errors. However, we eventually do catch all exceptions coming from FAISS and turn them into error statuses (regardless of whether the exception originated from FAISS itself or from our class).
Summary: The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this: 1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector. 2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`. Differential Revision: D67684898
660cc4c
to
c8b833b
Compare
This pull request was exported from Phabricator. Differential Revision: D67684898 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you!
++pos_; | ||
} | ||
|
||
void Prev() override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay. I thought the users would only access the InvertedList objects via FAISS (e.g. InvertedListScanner) and the underlying implementation uses this KNNIterator.
Reading through the unit test again helped. Thanks!
std::unique_ptr<Iterator> underlying_it(db->NewIterator(ReadOptions(), cfh2)); | ||
|
||
SecondaryIndexReadOptions read_options; | ||
read_options.similarity_search_neighbors = 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can be a follow up. One more case for a different number of neighbors would be nice
return Slice(); | ||
} | ||
|
||
Status GetProperty(std::string prop_name, std::string* prop) override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not strictly related to this diff, but curious.
I am not sure what was the reason we used the string copy instead of const &
for prop_name
when this was added 9 years ago.
If the user checks distance for every single embeddings, this can be an unnecessary overhead, right? I'm wondering if we want to introduce a better version of keeping iterator properties in the next major release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this goes way back, so I don't have any context about why the interface is this way but it is something we can fix in a major release (even though the cost of the copy is probably negligible).
This pull request has been merged in c2de783. |
Summary:
The patch is the read-side counterpart of #13197 . It adds support for K-nearest-neighbor vector similarity searches to
FaissIVFIndex
. There are two main pieces to this:KNNIterator
is anIterator
implementation that is returned byFaissIVFIndex
upon a call toNewIterator
.KNNIterator
treats itsSeek
target as a vector embedding and passes it to FAISS along with the number of neighbors requestedk
as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then useNext
(andPrev
) to iterate over the the vectors in the result set.KNNIterator
exposes the primary keys associated with the result vectors (see below how this is done), whilevalue
andcolumns
are empty. The iterator also supports a propertyrocksdb.faiss.ivf.index.distance
that can be used to retrieve the distance/similarity metric for the current result vector.IteratorAdapter
takes a RocksDB secondary index iterator (see Add a new interface method SecondaryIndex::NewIterator to enable querying the index #13257) and adapts it to the interface required by FAISS (faiss::InvertedListsIterator
), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of typefaiss::idx_t
,IteratorAdapter
usesKNNIterator
to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys byKNNIterator
.Differential Revision: D67684898