From b8c026b0bb41e53bc46916dbc8fd68151919cb97 Mon Sep 17 00:00:00 2001 From: Micka Date: Fri, 1 Dec 2023 05:46:11 +0100 Subject: [PATCH] Add filtering to vector search tutorial (#1996) Authors: - Micka (https://github.com/lowener) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: https://github.com/rapidsai/raft/pull/1996 --- docs/source/vector_search_tutorial.md | 35 ++++++++++++++++++++++++++- 1 file changed, 34 insertions(+), 1 deletion(-) diff --git a/docs/source/vector_search_tutorial.md b/docs/source/vector_search_tutorial.md index 8ff25143e3..40e73d0032 100644 --- a/docs/source/vector_search_tutorial.md +++ b/docs/source/vector_search_tutorial.md @@ -174,7 +174,7 @@ raft::neighbors::cagra::search( res, search_params, index, search, indices.view(), distances.view()); ``` -## Step 7: Evaluate neighborhood quality +## Step 5: Evaluate neighborhood quality In step 3 we built a flat index and queried for exact neighbors while in step 4 we build an ANN index and queried for approximate neighbors. How do you quickly figure out the quality of our approximate neighbors and whether it's in an acceptable range based on your needs? Just compute the `neighborhood_recall` which gives a single value in the range [0, 1]. Closer the value to 1, higher the quality of the approximation. @@ -341,3 +341,36 @@ The below example specifies the total number of bytes that RAFT can use for temp std::shared_ptr managed_resource; raft::device_resource res(managed_resource, std::make_optional(3 * 1024^3)); ``` + +### Filtering + +As of RAFT 23.10, support for pre-filtering of neighbors has been added to ANN index. This search feature can enable multiple use-cases, such as filtering a vector based on it's attributes (hybrid searches), the removal of vectors already added to the index, or the control of access in searches for security purposes. +The filtering is available through the `search_with_filtering()` function of the ANN index, and is done by applying a predicate function on the GPU, which usually have the signature `(uint32_t query_ix, uint32_t sample_ix) -> bool`. + +One of the most commonly used mechanism for filtering is the bitset: the bitset is a data structure that allows to test the presence of a value in a set through a fast lookup, and is implemented as a bit array so that every element contains a `0` or a `1` (respectively `false` and `true` in boolean logic). RAFT provides a `raft::core::bitset` class that can be used to create and manipulate bitsets on the GPU, and a `raft::core::bitset_view` class that can be used to pass bitsets to filtering functions. + +The following example demonstrates how to use the filtering API: + +```c++ +#include +#include + +using namespace raft::neighbors; +// use default index parameters +cagra::index_params index_params; +// create and fill the index from a [N, D] dataset +auto index = cagra::build(res, index_params, dataset); +// use default search parameters +cagra::search_params search_params; + +// create a bitset to filter the search +auto removed_indices = raft::make_device_vector(res, n_removed_indices); +raft::core::bitset removed_indices_bitset( + res, removed_indices.view(), dataset.extent(0)); + +// search K nearest neighbours according to a bitset filter +auto neighbors = raft::make_device_matrix(res, n_queries, k); +auto distances = raft::make_device_matrix(res, n_queries, k); +cagra::search_with_filtering(res, search_params, index, queries, neighbors, distances, + filtering::bitset_filter(removed_indices_bitset.view())); +```