Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support KNN search for FAISS IVF indices #13258

Closed
wants to merge 1 commit into from

Conversation

ltamasi
Copy link
Contributor

@ltamasi ltamasi commented Dec 28, 2024

Summary:
The patch is the read-side counterpart of #13197 . It adds support for K-nearest-neighbor vector similarity searches to FaissIVFIndex. There are two main pieces to this:

  1. KNNIterator is an Iterator implementation that is returned by FaissIVFIndex upon a call to NewIterator. KNNIterator treats its Seek target as a vector embedding and passes it to FAISS along with the number of neighbors requested k as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use Next (and Prev) to iterate over the the vectors in the result set. KNNIterator exposes the primary keys associated with the result vectors (see below how this is done), while value and columns are empty. The iterator also supports a property rocksdb.faiss.ivf.index.distance that can be used to retrieve the distance/similarity metric for the current result vector.
  2. IteratorAdapter takes a RocksDB secondary index iterator (see Add a new interface method SecondaryIndex::NewIterator to enable querying the index #13257) and adapts it to the interface required by FAISS (faiss::InvertedListsIterator), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type faiss::idx_t, IteratorAdapter uses KNNIterator to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by KNNIterator.

Differential Revision: D67684898

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67684898

@ltamasi ltamasi marked this pull request as draft December 28, 2024 01:18
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Dec 28, 2024
Summary: Pull Request resolved: facebook#13258

Differential Revision: D67684898
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Dec 28, 2024
Summary: Pull Request resolved: facebook#13258

Differential Revision: D67684898
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67684898

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67684898

ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Dec 28, 2024
Summary: Pull Request resolved: facebook#13258

Differential Revision: D67684898
@ltamasi ltamasi force-pushed the export-D67684898 branch 2 times, most recently from bd905aa to 1c9a496 Compare December 30, 2024 19:24
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Dec 30, 2024
Summary: Pull Request resolved: facebook#13258

Differential Revision: D67684898
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67684898

ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Dec 30, 2024
Summary: Pull Request resolved: facebook#13258

Differential Revision: D67684898
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67684898

ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Dec 30, 2024
Summary: Pull Request resolved: facebook#13258

Differential Revision: D67684898
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67684898

ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Dec 31, 2024
Summary:
The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this:

1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector.
2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`.


Differential Revision: D67684898
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67684898

ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Jan 2, 2025
Summary:
The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this:

1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector.
2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`.


Differential Revision: D67684898
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67684898

ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Jan 2, 2025
Summary:
The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this:

1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector.
2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`.


Differential Revision: D67684898
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67684898

ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Jan 2, 2025
Summary:
The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this:

1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector.
2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`.


Differential Revision: D67684898
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67684898

@ltamasi ltamasi marked this pull request as ready for review January 2, 2025 22:50
@ltamasi ltamasi requested a review from jaykorean January 2, 2025 23:01
// `similarity_search_probes` below.
//
// Default: none
std::optional<size_t> similarity_search_neighbors;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we want similarity search options can be in its own struct.

SecondaryIndex::NewIterator() may take both ReadOptions and SecondaryIndexOptions (or better name) and these two similarity search options can be part of the new Options struct.

I'm just little more cautious as we are adding more options to ReadOptions which will be used for a very specific use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I actually thought about that too - let me take a stab at it. btw, it would probably be sufficient for NewIterator to take just the secondary index read options (the "regular" read options would pertain to the underlying iterator).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Updated

++pos_;
}

void Prev() override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

faiss::InvertedListsIterator doesn't seem to have prev() API. Just curious where we would use this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is KNNIterator, which is an implementation of rocksdb::Iterator, not faiss::InvertedListsIterator. Prev can be used to move back to a previous vector in the result set (see the unit test)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay. I thought the users would only access the InvertedList objects via FAISS (e.g. InvertedListScanner) and the underlying implementation uses this KNNIterator.

Reading through the unit test again helped. Thanks!


const Status status = secondary_index_it_->status();
if (!status.ok()) {
throw std::runtime_error(status.ToString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why we throw for errors here (and below). Wouldn't it be sufficient if is_available() returns false after Update() fails?

Copy link
Contributor Author

@ltamasi ltamasi Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to differentiate between "out of items" and "some kind of error happened", similarly to RocksDB iterators. We wouldn't want to silently ignore I/O errors for example and produce an incorrect result. This is similar to how we expect RocksDB users to check an iterator's status() after Valid() returns false; it is possible that the iterator got invalidated because of an error (and not because we have exhausted all the elements).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a bit more context, IteratorAdapter's code executes in the context of FAISS, and FAISS uses exceptions to communicate errors. However, we eventually do catch all exceptions coming from FAISS and turn them into error statuses (regardless of whether the exception originated from FAISS itself or from our class).

Summary:
The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this:

1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector.
2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`.


Differential Revision: D67684898
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67684898

Copy link
Contributor

@jaykorean jaykorean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

++pos_;
}

void Prev() override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay. I thought the users would only access the InvertedList objects via FAISS (e.g. InvertedListScanner) and the underlying implementation uses this KNNIterator.

Reading through the unit test again helped. Thanks!

std::unique_ptr<Iterator> underlying_it(db->NewIterator(ReadOptions(), cfh2));

SecondaryIndexReadOptions read_options;
read_options.similarity_search_neighbors = 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can be a follow up. One more case for a different number of neighbors would be nice

return Slice();
}

Status GetProperty(std::string prop_name, std::string* prop) override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly related to this diff, but curious.

I am not sure what was the reason we used the string copy instead of const & for prop_name when this was added 9 years ago.

If the user checks distance for every single embeddings, this can be an unnecessary overhead, right? I'm wondering if we want to introduce a better version of keeping iterator properties in the next major release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this goes way back, so I don't have any context about why the interface is this way but it is something we can fix in a major release (even though the cost of the copy is probably negligible).

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in c2de783.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants