Add ability to specify select_k algorithm #1430

benfred · 2023-04-18T19:50:36Z

This adds the ability to specify which select_k algorithm we are using in matrix::select_k - with the goal of being able to run experiments to automatically pick the best algorithm.

We currently have at least 3 different implementations of a top-k algorithm inside of RAFT : radix select, warp sort - and the block select from faiss. (there is also another version in CAGRA that isn't included here). Each of these algorithms is fastest on certain inputs, and slower than others on other inputs.

This adds the ability to specify which select_k algorithm we are using in `matrix::select_k` - with the goal of being able to run experiments to automatically pick the best algorithm. We currently have at least 3 different implementations of a top-k algorithm inside of RAFT : radix select, warp sort - and the block select from faiss. (there is also another version in CAGRA that isn't included here). Each of these algorithms is fastest on certain inputs, and slower than others on other inputs.

benfred · 2023-04-18T19:56:12Z

Some timings from the different methods (will run more exhaustive experiments soon)

Single row

faiss block select underperforms here - since it use's a single threadblock for each row, leading to poor gpu utilization:

In [5]: x = cp.random.randn(1, 5_000_000).astype("float32")

In [6]: %timeit pylibraft.matrix.select_k(x, k=10, algo=pylibraft.matrix.SelectMethod.WARPSORT)
156 µs ± 371 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [7]: %timeit pylibraft.matrix.select_k(x, k=10, algo=pylibraft.matrix.SelectMethod.RADIX)
695 µs ± 717 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [8]: %timeit pylibraft.matrix.select_k(x, k=10, algo=pylibraft.matrix.SelectMethod.BLOCK)
14 ms ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

1000 rows

In [10]: x = cp.random.randn(1000, 500_000).astype("float32")

In [11]: %timeit pylibraft.matrix.select_k(x, k=10, algo=pylibraft.matrix.SelectMethod.WARPSORT)
3.04 ms ± 4.72 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [12]: %timeit pylibraft.matrix.select_k(x, k=10, algo=pylibraft.matrix.SelectMethod.RADIX)
28.5 ms ± 7.89 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [13]: %timeit pylibraft.matrix.select_k(x, k=10, algo=pylibraft.matrix.SelectMethod.BLOCK)
3.76 ms ± 5.55 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

larger k value

warpsort doesn't support k=1024, faiss block select outperforms radix in this case

In [18]: %timeit pylibraft.matrix.select_k(x, k=1024, algo=pylibraft.matrix.SelectMethod.RADIX)
37.8 ms ± 143 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [19]: %timeit pylibraft.matrix.select_k(x, k=1024, algo=pylibraft.matrix.SelectMethod.BLOCK)
15.9 ms ± 19.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

…_algo

benfred · 2023-04-19T23:24:03Z

I've started to push out some code to run experiments with this branch to https://gitlab-master.nvidia.com/benf/raft_select_k_experiments

For different input shapes and k values - the fastest algorithm can be quite different. As an example, looking at 2**20 columns - with both 256 rows and 16 rows, the best method changes with different values of k:

Next steps will be to integrate CAGRA into this, and then automate finding a heuristic to automatically pick the best algorithm. I think both of these should be done in a separate PR though

…_algo

Revert the heuristic updates - I was originally intending for this to be in a followup PR, but thought I could sneak in here (It seems to be causing an issue with the ivf_pq unittest, and needs a bit more work before being ready)

achirkin · 2023-04-24T19:24:40Z

In addition to comments from @tfeher in #1455 , I'd like to point out, that I'm not sure we should expose the ability to specify the select_k algorithm.

We have the infrastructure in place to develop a better heuristics and choose the best algorithm internally.
We have several internal implementations and may add more. Currently, we consider them an implementation detail.
Moreover, we had the ability to select an implementation before and deprecated that interface since the new one was introduced.

To sum up, I'd suggest we use our C++ benchmarks to learn the heuristics and then plug in the heuristics directly into include/raft/matrix/detail/select_k.cuh.
@cjnolet , your thoughts on this?

benfred · 2023-04-26T00:18:08Z

To sum up, I'd suggest we use our C++ benchmarks to learn the heuristics and then plug in the heuristics directly into include/raft/matrix/detail/select_k.cuh. @cjnolet , your thoughts on this?

I think that sounds like a good plan - going to close this PR

cjnolet · 2023-04-26T00:23:18Z

@cjnolet , your thoughts on this

I like the idea of learning the heuristics and I like the idea of using our c++ benchmarks to run over the grid search, but I think it's extremely useful to consider pulling the results of the grid search into python so we can actually build a model (like the DT/RF model Ben has been playing with).

I've already expressed my concern with using an enum to specify the algo, mostly because it doesn't gel with the way we have done this for other APIs in raft, which would just expose new functions for each variant. Just to be clear- I still think there's benefit to exposing the different k-select variants individually, but at this point that could be a conversation for a different day (unless, of course, anyone else sees a dire need for this in the near term).

benfred added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 18, 2023

benfred requested review from a team as code owners April 18, 2023 19:50

Merge branch 'branch-23.06' into select_k_algo

5b119d7

github-actions bot added cpp python labels Apr 18, 2023

benfred added 2 commits April 18, 2023 13:41

add missing file

6f172af

Merge branch 'select_k_algo' of github.com:benfred/raft into select_k…

5c4c7e2

…_algo

benfred mentioned this pull request Apr 18, 2023

Remove MetricProcessor code from brute_force::knn #1426

Merged

benfred added 2 commits April 19, 2023 16:14

update docstring

04699f7

Merge branch 'branch-23.06' into select_k_algo

60b5387

benfred added 6 commits April 20, 2023 14:48

Update select_k heuristic to predict fastest algorithm

051d2b5

Merge branch 'select_k_algo' of github.com:benfred/raft into select_k…

59e12fa

…_algo

Merge branch 'branch-23.06' into select_k_algo

6c295e3

add inline

b71d523

Merge branch 'select_k_algo' of github.com:benfred/raft into select_k…

3a6bb4e

…_algo

revert heuristic updates

b35138f

Revert the heuristic updates - I was originally intending for this to be in a followup PR, but thought I could sneak in here (It seems to be causing an issue with the ivf_pq unittest, and needs a bit more work before being ready)

benfred mentioned this pull request Apr 21, 2023

Learn heuristic to pick fastest select_k algorithm #1455

Closed

cjnolet assigned benfred Apr 25, 2023

benfred closed this Apr 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to specify select_k algorithm #1430

Add ability to specify select_k algorithm #1430

benfred commented Apr 18, 2023

benfred commented Apr 18, 2023

benfred commented Apr 19, 2023

achirkin commented Apr 24, 2023

benfred commented Apr 26, 2023

cjnolet commented Apr 26, 2023

Add ability to specify select_k algorithm #1430

Add ability to specify select_k algorithm #1430

Conversation

benfred commented Apr 18, 2023

benfred commented Apr 18, 2023

Single row

1000 rows

larger k value

benfred commented Apr 19, 2023

achirkin commented Apr 24, 2023

benfred commented Apr 26, 2023

cjnolet commented Apr 26, 2023