Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Integrate
accumulate_into_selected
from ANN utils into `linalg::red…
…uce_rows_by_keys` (#909) `accumulate_into_selected` achieves much better performance than the previous implementation of `reduce_rows_by_keys` for large `nkeys` (`sum_rows_by_key_large_nkeys_kernel_rowmajor`). According to the benchmark that I added for this primitive, the difference is a factor of 240x for sizes relevant to IVF-Flat (and a factor of ~10x for smaller `nkeys`, e.g 64). This is mostly because the legacy implementation, probably in an attempt to reduce atomic conflicts, assigned a key and a tile of the matrix to each block, and the block only reduces the rows corresponding to the assigned key. With a very large number of keys, e.g 1k, this results in blocks iterating over a large number of rows (possibly tens of thousands) and only reading and accumulating 1 in 1k rows. This PR: - Replaces `sum_rows_by_key_large_nkeys_rowmajor` with `accumulate_into_selected` (I didn't find any cases in which the old kernel performed better). - Removes `accumulate_into_selected` from `ann_utils.cuh`. - Fixes support for custom iterators in `reduce_rows_by_keys`. - Uses the raft prims in `calc_centers_and_sizes`. Perf notes: - The original kmeans gets a 15-20% speedup for large numbers of clusters. - The performance of `ivf_flat::build` stays the same as before. - There are a bunch of extra steps since I separated the cluster size count from the reduction by key, but they are quite neglectable in comparison. Question: the change breaks support for host-side-only arrays in `calc_centers_and_sizes`, is it actually a possibility? Should I add a branch and not use the raft prims when all arrays are host-side? cc @achirkin @tfeher @cjnolet Authors: - Louis Sugy (https://github.com/Nyrio) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: #909
- Loading branch information