Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support non-fused k-selection in IVF-flat #1555

Closed
lowener opened this issue May 30, 2023 · 2 comments
Closed

[FEA] Support non-fused k-selection in IVF-flat #1555

lowener opened this issue May 30, 2023 · 2 comments
Assignees
Labels
FAISS improvement Improvement / enhancement to an existing function Milvus non-breaking Non-breaking change REDIS Vector Search

Comments

@lowener
Copy link
Contributor

lowener commented May 30, 2023

static constexpr int kMaxCapacity = 256;

// NB: this is the limitation of the warpsort structures that use a huge number of
// registers (used in the main kernel here).
RAFT_EXPECTS(capacity == Capacity,
"Capacity must be power-of-two not bigger than the maximum allowed size "
"matrix::detail::select::warpsort::kMaxCapacity (%d).",
matrix::detail::select::warpsort::kMaxCapacity);

Providing a fallback when k>256 would be useful.

@cjnolet
Copy link
Member

cjnolet commented Jun 8, 2023

Linking this to #711, converting this to a feature request, and tagging @achirkin to provide details Im likely to miss- the problem here is that ivf-pq supports both fused and non-fused k-selection (and so is able to support very large settings of k) while ivf-flat on supports the fused k-selection.

The need here is to implement non-fused k-selection in ivf-flat so that we can support larger values of k.

@cjnolet cjnolet changed the title [BUG] IVF-Flat TopK operation is limited to k=256 [FEA] Support non-fused k-selection in IVF-flat Jun 8, 2023
@cjnolet cjnolet added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change FAISS labels Jun 8, 2023
@achirkin
Copy link
Contributor

As @cjnolet aptly specified in the new title, the issue is that IVF-flat implementation relies on select::warpsort to fuse the k-selection with computing the distance. The warpsort primitives store the top-k values per-warp in registers for efficiency, and thus k cannot grow unbound. The only workaround I see is to implement a non-fused version of IVF-flat, that would use the host-side matrix::select_k (which falls back to the radix-based implementation for large k). This would be the same approach as the currently implemented IVF-PQ.

Following the IVF-PQ example, I think the scope of the work can (mostly) be limited to a single file neighbors/detail/ivf_flat_interleaved_scan-inl.cuh:

  1. interleaved_scan_kernel: add a new codepath re-using template parameter Capacity == 0 we may need to factor-out and reuse ivf-pq's dummy_block_sort_t for this.
  2. NB: In the non-fused mode, each processed (query, probe) produces the output of different size (number of rows in a cluster). We'd need to tackle this either by padding data or calculating the sizes. In ivf-pq, we estimate max_samples - maximum possible cumulative size of clusters probed for one query - to allocate a big enough intermediate buffer for the found distances. The results are packed densely for probes within one query, but padded with dummy values between queries.
  3. For (2) we'd need something similar to calc_chunk_indices
  4. In the non-fused version of the kernel we don't store the indices to save the memory bandwidth. Thus the indices need to be reconstructed in an extra post-processing step (see ivf-pq's postprocess_neighbors)
  5. As an extra to the previous step, we can improve compile times by fixing the IdxT = uint32_t of the interleaved_scan_kernel (Make all codepaths output the sample indices and convert them to the db indices in the post-processing step).
  6. Adjust the select_interleaved_scan_kernel to include the Capacity == 0 case (a minimal change).
  7. Since we already have some batching inside the launch_kernel, we could allocate the extra intermediate buffer storing the distances (max_samples * kMaxGridY) outside that loop and reuse it. Then the matrix::select_k would go at the end of each iteration.

@cjnolet cjnolet moved this from Todo to In Progress in VS/ML/DM Primitives Release Board Jun 22, 2023
rapids-bot bot pushed a commit that referenced this issue Mar 4, 2024
Add support for topk > 256 for ivf_flat (Issue [#1555](#1555))

The PR adds a non-fused version of topk that is utilized if k > 256.

FYI, @tfeher

Authors:
  - Malte Förster (https://github.com/mfoerste4)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #2169
@github-project-automation github-project-automation bot moved this from In Progress to Done in VS/ML/DM Primitives Release Board Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FAISS improvement Improvement / enhancement to an existing function Milvus non-breaking Non-breaking change REDIS Vector Search
Projects
Development

No branches or pull requests

4 participants