[BUG][Performance] matrix::detail::select::radix: inefficient "one-block" kernel launch #1823

achirkin · 2023-09-13T14:25:52Z

matrix::detail::select::radix contains two almost identical implementations of the radix MSD select algorithm. These are radix_kernel and radix_topk_one_block_kernel. The main difference is that the one-block kernel is tailored for somewhat larger batch sizes: it runs only one block per row, and thus does not require any inter-block communication. There is, however a two-fold problem with the one-block kernel:

It uses the same function calc_chunk_size() to select the CUDA grid size as the normal kernel; as a result, sometimes very small grid sizes are selected and the algorithm runs at a very low occupancy in an inefficient loop over the input batch.
It allocates the temporary buffers of the size input_row_length * gridDim * 2, which can become extremely large and inefficient if we fix (1). In contrast, the normal kernel has an optimization to use a limited-size temporary buffers.

By default, this problem is masked by the matrix::select_k heuristic. It just selects a faster legacy implementation borrowed from FAISS for the problematic input sizes.

The text was updated successfully, but these errors were encountered:

- fix matrix::detail::select::radix::calc_chunk_size() for one-block kernel - use `calc_buf_len()` rather than `len` as the buffer size of one-block kernel - reduce register footprint of one-block kernel by reducing the number of buffer pointers - reduce the buffer size by 1/8 for all radix select functions Resolve #1823 Authors: - Yong Wang (https://github.com/yong-wang) - Corey J. Nolet (https://github.com/cjnolet) - Ben Frederickson (https://github.com/benfred) - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Artem M. Chirkin (https://github.com/achirkin) - Ben Frederickson (https://github.com/benfred) URL: #1878

achirkin added the bug Something isn't working label Sep 13, 2023

achirkin changed the title ~~[Performance] matrix::detail::select::radix: inefficient "one-block" kernel launch~~ [BUG][Performance] matrix::detail::select::radix: inefficient "one-block" kernel launch Sep 13, 2023

yong-wang mentioned this issue Oct 9, 2023

Fix and improve one-block radix select #1878

Merged

rapids-bot bot closed this as completed in #1878 Nov 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG][Performance] matrix::detail::select::radix: inefficient "one-block" kernel launch #1823

[BUG][Performance] matrix::detail::select::radix: inefficient "one-block" kernel launch #1823

achirkin commented Sep 13, 2023

[BUG][Performance] matrix::detail::select::radix: inefficient "one-block" kernel launch #1823

[BUG][Performance] matrix::detail::select::radix: inefficient "one-block" kernel launch #1823

Comments

achirkin commented Sep 13, 2023