-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write-side support for FAISS IVF indices #13197
Conversation
This pull request was exported from Phabricator. Differential Revision: D66907065 |
Hm, seems like the unity build will need some work the very least. |
Summary: The patch adds initial support for backing FAISS's inverted file based indices with data stored in RocksDB. It introduces a `SecondaryIndex` implementation called `FaissIVFIndex` which takes ownership of a `faiss::IndexIVF` object. During indexing, `FaissIVFIndex` treats the original value of the specified primary column as an embedding vector, and passes it to the provided FAISS index object to perform quantization. It replaces the original embedding vector with the result of the coarse quantizer (i.e. the inverted list id), and puts the result of the fine quantizer (if any) into the secondary index value. Note that this patch is only one half of the equation; it provides a way of storing FAISS inverted lists in RocksDB but there is currently no retrieval/search support (this will be a follow-up change). Also, the integration currently works only with our internal Buck build. I plan to add support for `cmake` / `make` based builds similarly to how we handle Folly. Differential Revision: D66907065
b1325ab
to
c397567
Compare
This pull request was exported from Phabricator. Differential Revision: D66907065 |
Summary: The patch adds initial support for backing FAISS's inverted file based indices with data stored in RocksDB. It introduces a `SecondaryIndex` implementation called `FaissIVFIndex` which takes ownership of a `faiss::IndexIVF` object. During indexing, `FaissIVFIndex` treats the original value of the specified primary column as an embedding vector, and passes it to the provided FAISS index object to perform quantization. It replaces the original embedding vector with the result of the coarse quantizer (i.e. the inverted list id), and puts the result of the fine quantizer (if any) into the secondary index value. Note that this patch is only one half of the equation; it provides a way of storing FAISS inverted lists in RocksDB but there is currently no retrieval/search support (this will be a follow-up change). Also, the integration currently works only with our internal Buck build. I plan to add support for `cmake` / `make` based builds similarly to how we handle Folly. Differential Revision: D66907065
c397567
to
5f026d2
Compare
This pull request was exported from Phabricator. Differential Revision: D66907065 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the PR!
This pull request has been merged in b339d08. |
Summary: The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this: 1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector. 2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`. Differential Revision: D67684898
Summary: The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this: 1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector. 2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`. Differential Revision: D67684898
Summary: The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this: 1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector. 2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`. Differential Revision: D67684898
Summary: The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this: 1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector. 2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`. Differential Revision: D67684898
Summary: The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this: 1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector. 2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`. Differential Revision: D67684898
Summary: The patch is the read-side counterpart of #13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this: 1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector. 2) `IteratorAdapter` takes a RocksDB secondary index iterator (see #13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`. Pull Request resolved: #13258 Reviewed By: jaykorean Differential Revision: D67684898 fbshipit-source-id: 5b5c4c438deb86b35d5d45262ce290caee083bca
Summary: The patch adds initial support for backing FAISS's inverted file based indices with data stored in RocksDB. It introduces a
SecondaryIndex
implementation calledFaissIVFIndex
which takes ownership of afaiss::IndexIVF
object. During indexing,FaissIVFIndex
treats the original value of the specified primary column as an embedding vector, and passes it to the provided FAISS index object to perform quantization. It replaces the original embedding vector with the result of the coarse quantizer (i.e. the inverted list id), and puts the result of the fine quantizer (if any) into the secondary index value. Note that this patch is only one half of the equation; it provides a way of storing FAISS inverted lists in RocksDB but there is currently no retrieval/search support (this will be a follow-up change). Also, the integration currently works only with our internal Buck build. I plan to add support forcmake
/make
based builds similarly to how we handle Folly.Differential Revision: D66907065