Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write-side support for FAISS IVF indices #13197

Closed
wants to merge 1 commit into from

Conversation

ltamasi
Copy link
Contributor

@ltamasi ltamasi commented Dec 6, 2024

Summary: The patch adds initial support for backing FAISS's inverted file based indices with data stored in RocksDB. It introduces a SecondaryIndex implementation called FaissIVFIndex which takes ownership of a faiss::IndexIVF object. During indexing, FaissIVFIndex treats the original value of the specified primary column as an embedding vector, and passes it to the provided FAISS index object to perform quantization. It replaces the original embedding vector with the result of the coarse quantizer (i.e. the inverted list id), and puts the result of the fine quantizer (if any) into the secondary index value. Note that this patch is only one half of the equation; it provides a way of storing FAISS inverted lists in RocksDB but there is currently no retrieval/search support (this will be a follow-up change). Also, the integration currently works only with our internal Buck build. I plan to add support for cmake / make based builds similarly to how we handle Folly.

Differential Revision: D66907065

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66907065

@ltamasi
Copy link
Contributor Author

ltamasi commented Dec 7, 2024

I plan to add support for cmake / make based builds similarly to how we handle Folly.

Hm, seems like the unity build will need some work the very least.

ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Dec 7, 2024
Summary:

The patch adds initial support for backing FAISS's inverted file based indices with data stored in RocksDB. It introduces a `SecondaryIndex` implementation called `FaissIVFIndex` which takes ownership of a `faiss::IndexIVF` object. During indexing, `FaissIVFIndex` treats the original value of the specified primary column as an embedding vector, and passes it to the provided FAISS index object to perform quantization. It replaces the original embedding vector with the result of the coarse quantizer (i.e. the inverted list id), and puts the result of the fine quantizer (if any) into the secondary index value. Note that this patch is only one half of the equation; it provides a way of storing FAISS inverted lists in RocksDB but there is currently no retrieval/search support (this will be a follow-up change). Also, the integration currently works only with our internal Buck build. I plan to add support for `cmake` / `make` based builds similarly to how we handle Folly.

Differential Revision: D66907065
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66907065

Summary:

The patch adds initial support for backing FAISS's inverted file based indices with data stored in RocksDB. It introduces a `SecondaryIndex` implementation called `FaissIVFIndex` which takes ownership of a `faiss::IndexIVF` object. During indexing, `FaissIVFIndex` treats the original value of the specified primary column as an embedding vector, and passes it to the provided FAISS index object to perform quantization. It replaces the original embedding vector with the result of the coarse quantizer (i.e. the inverted list id), and puts the result of the fine quantizer (if any) into the secondary index value. Note that this patch is only one half of the equation; it provides a way of storing FAISS inverted lists in RocksDB but there is currently no retrieval/search support (this will be a follow-up change). Also, the integration currently works only with our internal Buck build. I plan to add support for `cmake` / `make` based builds similarly to how we handle Folly.

Differential Revision: D66907065
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66907065

Copy link
Contributor

@jowlyzhang jowlyzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the PR!

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in b339d08.

ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Dec 31, 2024
Summary:
The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this:

1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector.
2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`.


Differential Revision: D67684898
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Jan 2, 2025
Summary:
The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this:

1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector.
2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`.


Differential Revision: D67684898
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Jan 2, 2025
Summary:
The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this:

1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector.
2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`.


Differential Revision: D67684898
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Jan 2, 2025
Summary:
The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this:

1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector.
2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`.


Differential Revision: D67684898
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Jan 4, 2025
Summary:
The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this:

1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector.
2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`.


Differential Revision: D67684898
facebook-github-bot pushed a commit that referenced this pull request Jan 6, 2025
Summary:
The patch is the read-side counterpart of #13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this:

1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector.
2) `IteratorAdapter` takes a RocksDB secondary index iterator (see #13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`.

Pull Request resolved: #13258

Reviewed By: jaykorean

Differential Revision: D67684898

fbshipit-source-id: 5b5c4c438deb86b35d5d45262ce290caee083bca
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants