-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for fp16 in CAGRA and IVF-PQ #2085
Conversation
…n the raft_objs component
NB: this does not add the fp16 capabilities to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @achirkin for this PR!
We have here a large amount of boilerplate code. Fortunately the nontrivial changes are relatively small, and are confined to
- mdspan_numpy_serializer.hpp
- device_load_stores.cuh
- test/neighbors/ann_cagra.cuh
The PR looks good to me!
@@ -148,6 +149,26 @@ DI void sts(int32_t* addr, const int32_t (&x)[4]) | |||
: "l"(s4), "r"(x[0]), "r"(x[1]), "r"(x[2]), "r"(x[3])); | |||
} | |||
|
|||
DI void sts(half* addr, const half& x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tagging @mdoijade to have a look at the changes in this file, since the load and store ops here are mostly used by IVF-Flat and contractions.cuh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The additions to device_loads_stores.cuh
looks good to me, I agree it is good to have matching sts
function call for lds
for larger fp16 vector sizes.
Co-authored-by: tsuki <[email protected]>
Co-authored-by: tsuki <[email protected]>
Co-authored-by: tsuki <[email protected]>
/merge |
This reverts commit 72f48ae.
RAFT C++ tests were not running for a portion of the 24.02 development cycle, until the merger of rapidsai/rapids-cmake#533. This PR fixes some failing tests and reverts PRs that caused test failures that were silent until now, specifically #2097 and #2085. These features will be revisited in a subsequent release. Authors: - Malte Förster (https://github.com/mfoerste4) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Ben Frederickson (https://github.com/benfred) - Bradley Dice (https://github.com/bdice)
Add fp16 (CUDA half) support to CAGRA and its dependencies. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - tsuki (https://github.com/enp1s0) URL: rapidsai#2085
1. Add fp16 (CUDA half) support to CAGRA and its dependencies (#2085). 2. Fix the shared memory size error in the ivf-flat that got exposed by new tests in #2085. Regarding the point (2): Warp-sort top-k queue uses shared memory; the module provides the required shmem size calculation function decoupled from the queue object itself. As a result, it's easy to plug-in wrong types and get the calculation incorrectly. IVF-Flat scan kernel always kept the distances in the queue as floats, but we calculated the shmem size as if it used `AccT` (IVF-Flat's internal accumulation type). Hence, with adding the tests with fp16 inputs (and `AccT`), the allocated shmem became too small, which resulted in memory access violation errors. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Ben Frederickson (https://github.com/benfred) URL: #2172
Add fp16 (CUDA half) support to CAGRA and its dependencies.