Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Introduce sample filtering to IVFPQ index search (#1513)
A prototype that introduces a per-sample filtering for IVFPQ search. Please feel free to use it as a foundation for the future change, if appropriate, because the code is functional, but is not super clean-and-neat. The diff introduces a template parameter called `SampleFilterT`. An instance is expected * to be `SampleFilterT()` constructible (which was mostly needed to define a default behavior in the form of `SampleFilterT sample_filter=SampleFilterT()`, see below) * to provide a `inline __device__ bool operator(...)` that returns `true` is a given sample is valid for being used against a given query in IVFPQ search The default filter (that I set as a default one in certain facilities in the form of `typename SampleFilterT = NoneSampleFilter` in order not to modify way to many files) allows all samples to be used: ``` struct NoneSampleFilter { inline __device__ __host__ bool operator()( // query index const uint32_t query_ix, // the current inverted list index const uint32_t cluster_ix, // the index of the current sample inside the current inverted list const uint32_t sample_ix ) const { return true; } }; ``` Here `__host__` is needed for a CPU-based testing only. Also, I've provided an implementation of `BitMaskSampleFilter` that allows to filter samples based on a bit mask, as an example. The implementation was tested in the semi-production environment. All the filter-related files were added to `cpp/include/raft/neighbors/detail/sample_filter.cuh`. I did not change the default `ivf_pq_search()` method remains unchanged, but one more `ivf_pq_search_with_filtering()` method with an additional template argument `SampleFilterT` and one more input parameter was introduced. ``` template <typename T, typename IdxT, typename SampleFilterT> void search_with_filtering(raft::device_resources const& handle, const raft::neighbors::ivf_pq::search_params& params, const index<IdxT>& idx, const T* queries, uint32_t n_queries, uint32_t k, IdxT* neighbors, float* distances, rmm::mr::device_memory_resource* mr = nullptr, SampleFilterT sample_filter = SampleFilterT()); ``` All the current instantiations use `NoneSampleFilter` only. I've used `SampleFilterT sample_filter` parameters passing instead of `const SampleFilterT sample_filter` in the function calls in order to be able to add some debugging facilities to a filter and with the hope that the compiler is smart enough to understand the de-facto constness if needed. The filter does not take a computed distance score into account by design, thus the current implementation cannot have a distance threshold. This can be easily changed, if appropriate. It is still questionable to me whether this filtering needs to be injected right inside the search kernel instead of doing post-processing, please let me know if you have any thoughts on the topic. I'm happy to address the comments. Thanks. Authors: - Alexander Guzhva (https://github.com/alexanderguzhva) - Artem M. Chirkin (https://github.com/achirkin) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Artem M. Chirkin (https://github.com/achirkin) - Corey J. Nolet (https://github.com/cjnolet) URL: #1513
- Loading branch information