Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reapply: Support for fp16 in CAGRA and IVF-PQ #2172

Merged
merged 3 commits into from
Feb 13, 2024

Conversation

achirkin
Copy link
Contributor

  1. Add fp16 (CUDA half) support to CAGRA and its dependencies (Support for fp16 in CAGRA and IVF-PQ #2085).
  2. Fix the shared memory size error in the ivf-flat that got exposed by new tests in Support for fp16 in CAGRA and IVF-PQ #2085.

Regarding the point (2):
Warp-sort top-k queue uses shared memory; the module provides the required shmem size calculation function decoupled from the queue object itself. As a result, it's easy to plug-in wrong types and get the calculation incorrectly.
IVF-Flat scan kernel always kept the distances in the queue as floats, but we calculated the shmem size as if it used AccT (IVF-Flat's internal accumulation type). Hence, with adding the tests with fp16 inputs (and AccT), the allocated shmem became too small, which resulted in memory access violation errors.

achirkin and others added 2 commits February 11, 2024 07:46
Copy link
Member

@benfred benfred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

I didn't review the whole change - but did a diff between this branch and the original PR https://github.com/rapidsai/raft/pull/2085/files and found the only difference was the fix

diff --git a/cpp/include/raft/neighbors/detail/ivf_flat_interleaved_scan-inl.cuh b/cpp/include/raft/neighbors/detail/ivf_flat_interleaved_scan-inl.cuh
index 51cd2876d..1cf042c6c 100644
--- a/cpp/include/raft/neighbors/detail/ivf_flat_interleaved_scan-inl.cuh
+++ b/cpp/include/raft/neighbors/detail/ivf_flat_interleaved_scan-inl.cuh
@@ -844,7 +844,7 @@ void launch_kernel(Lambda lambda,
   int smem_size              = query_smem_elems * sizeof(T);
   constexpr int kSubwarpSize = std::min<int>(Capacity, WarpSize);
   auto block_merge_mem =
-    raft::matrix::detail::select::warpsort::calc_smem_size_for_block_wide<AccT, IdxT>(
+    raft::matrix::detail::select::warpsort::calc_smem_size_for_block_wide<float, IdxT>(

       kThreadsPerBlock / kSubwarpSize, k);
   smem_size += std::max<int>(smem_size, block_merge_mem);

which should resolve the unittest failures that we were seeing before.

Thanks for fixing this @achirkin !

@github-actions github-actions bot added the CMake label Feb 12, 2024
@benfred
Copy link
Member

benfred commented Feb 13, 2024

/merge

@rapids-bot rapids-bot bot merged commit 65ae560 into rapidsai:branch-24.04 Feb 13, 2024
61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

Successfully merging this pull request may close these issues.

2 participants