Reapply: Support for fp16 in CAGRA and IVF-PQ #2172

achirkin · 2024-02-11T07:00:47Z

Add fp16 (CUDA half) support to CAGRA and its dependencies (Support for fp16 in CAGRA and IVF-PQ #2085).
Fix the shared memory size error in the ivf-flat that got exposed by new tests in Support for fp16 in CAGRA and IVF-PQ #2085.

Regarding the point (2):
Warp-sort top-k queue uses shared memory; the module provides the required shmem size calculation function decoupled from the queue object itself. As a result, it's easy to plug-in wrong types and get the calculation incorrectly.
IVF-Flat scan kernel always kept the distances in the queue as floats, but we calculated the shmem size as if it used AccT (IVF-Flat's internal accumulation type). Hence, with adding the tests with fp16 inputs (and AccT), the allocated shmem became too small, which resulted in memory access violation errors.

Add fp16 (CUDA half) support to CAGRA and its dependencies. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - tsuki (https://github.com/enp1s0) URL: rapidsai#2085

benfred

lgtm!

I didn't review the whole change - but did a diff between this branch and the original PR https://github.com/rapidsai/raft/pull/2085/files and found the only difference was the fix

diff --git a/cpp/include/raft/neighbors/detail/ivf_flat_interleaved_scan-inl.cuh b/cpp/include/raft/neighbors/detail/ivf_flat_interleaved_scan-inl.cuh
index 51cd2876d..1cf042c6c 100644
--- a/cpp/include/raft/neighbors/detail/ivf_flat_interleaved_scan-inl.cuh
+++ b/cpp/include/raft/neighbors/detail/ivf_flat_interleaved_scan-inl.cuh
@@ -844,7 +844,7 @@ void launch_kernel(Lambda lambda,
   int smem_size              = query_smem_elems * sizeof(T);
   constexpr int kSubwarpSize = std::min<int>(Capacity, WarpSize);
   auto block_merge_mem =
-    raft::matrix::detail::select::warpsort::calc_smem_size_for_block_wide<AccT, IdxT>(
+    raft::matrix::detail::select::warpsort::calc_smem_size_for_block_wide<float, IdxT>(

       kThreadsPerBlock / kSubwarpSize, k);
   smem_size += std::max<int>(smem_size, block_merge_mem);

which should resolve the unittest failures that we were seeing before.

Thanks for fixing this @achirkin !

benfred · 2024-02-13T01:15:22Z

/merge

achirkin and others added 2 commits February 11, 2024 07:46

Fix the shmem size in the ivf-flat scan kernel

fe02040

achirkin requested review from a team as code owners February 11, 2024 07:00

github-actions bot added cpp CMake labels Feb 11, 2024

achirkin added 3 - Ready for Review feature request New feature or request non-breaking Non-breaking change cpp and removed cpp CMake labels Feb 11, 2024

cjnolet assigned achirkin Feb 11, 2024

benfred approved these changes Feb 12, 2024

View reviewed changes

Merge branch 'branch-24.04' into fea-fp16-again

cd8339d

github-actions bot added the CMake label Feb 12, 2024

rapids-bot bot merged commit 65ae560 into rapidsai:branch-24.04 Feb 13, 2024
61 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reapply: Support for fp16 in CAGRA and IVF-PQ #2172

Reapply: Support for fp16 in CAGRA and IVF-PQ #2172

achirkin commented Feb 11, 2024

benfred left a comment •

edited

Loading

benfred commented Feb 13, 2024

Reapply: Support for fp16 in CAGRA and IVF-PQ #2172

Reapply: Support for fp16 in CAGRA and IVF-PQ #2172

Conversation

achirkin commented Feb 11, 2024

benfred left a comment • edited Loading

Choose a reason for hiding this comment

benfred commented Feb 13, 2024

benfred left a comment •

edited

Loading