Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to specify select_k algorithm #1430

Closed
wants to merge 12 commits into from

Conversation

benfred
Copy link
Member

@benfred benfred commented Apr 18, 2023

This adds the ability to specify which select_k algorithm we are using in matrix::select_k - with the goal of being able to run experiments to automatically pick the best algorithm.

We currently have at least 3 different implementations of a top-k algorithm inside of RAFT : radix select, warp sort - and the block select from faiss. (there is also another version in CAGRA that isn't included here). Each of these algorithms is fastest on certain inputs, and slower than others on other inputs.

This adds the ability to specify which select_k algorithm we are using in
`matrix::select_k` - with the goal of being able to run experiments to
automatically pick the best algorithm.

We currently have at least 3 different implementations of a top-k algorithm
inside of RAFT : radix select, warp sort - and the block select from faiss.
(there is also another version in CAGRA that isn't included here).  Each
of these algorithms is fastest on certain inputs, and slower than others
on other inputs.
@benfred benfred added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 18, 2023
@benfred benfred requested review from a team as code owners April 18, 2023 19:50
@benfred
Copy link
Member Author

benfred commented Apr 18, 2023

Some timings from the different methods (will run more exhaustive experiments soon)

Single row

faiss block select underperforms here - since it use's a single threadblock for each row, leading to poor gpu utilization:

In [5]: x = cp.random.randn(1, 5_000_000).astype("float32")

In [6]: %timeit pylibraft.matrix.select_k(x, k=10, algo=pylibraft.matrix.SelectMethod.WARPSORT)
156 µs ± 371 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [7]: %timeit pylibraft.matrix.select_k(x, k=10, algo=pylibraft.matrix.SelectMethod.RADIX)
695 µs ± 717 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [8]: %timeit pylibraft.matrix.select_k(x, k=10, algo=pylibraft.matrix.SelectMethod.BLOCK)
14 ms ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

1000 rows

In [10]: x = cp.random.randn(1000, 500_000).astype("float32")

In [11]: %timeit pylibraft.matrix.select_k(x, k=10, algo=pylibraft.matrix.SelectMethod.WARPSORT)
3.04 ms ± 4.72 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [12]: %timeit pylibraft.matrix.select_k(x, k=10, algo=pylibraft.matrix.SelectMethod.RADIX)
28.5 ms ± 7.89 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [13]: %timeit pylibraft.matrix.select_k(x, k=10, algo=pylibraft.matrix.SelectMethod.BLOCK)
3.76 ms ± 5.55 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

larger k value

warpsort doesn't support k=1024, faiss block select outperforms radix in this case

In [18]: %timeit pylibraft.matrix.select_k(x, k=1024, algo=pylibraft.matrix.SelectMethod.RADIX)
37.8 ms ± 143 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [19]: %timeit pylibraft.matrix.select_k(x, k=1024, algo=pylibraft.matrix.SelectMethod.BLOCK)
15.9 ms ± 19.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

@benfred
Copy link
Member Author

benfred commented Apr 19, 2023

I've started to push out some code to run experiments with this branch to https://gitlab-master.nvidia.com/benf/raft_select_k_experiments

For different input shapes and k values - the fastest algorithm can be quite different. As an example, looking at 2**20 columns - with both 256 rows and 16 rows, the best method changes with different values of k:

cols1048576rows256
cols1048576rows16

Next steps will be to integrate CAGRA into this, and then automate finding a heuristic to automatically pick the best algorithm. I think both of these should be done in a separate PR though

benfred added 6 commits April 20, 2023 14:48
Revert the heuristic updates - I was originally intending for this to
be in a followup PR, but thought I could sneak in here

(It seems to be causing an issue with the ivf_pq unittest, and needs
a bit more work before being ready)
@achirkin
Copy link
Contributor

In addition to comments from @tfeher in #1455 , I'd like to point out, that I'm not sure we should expose the ability to specify the select_k algorithm.

  1. We have the infrastructure in place to develop a better heuristics and choose the best algorithm internally.
  2. We have several internal implementations and may add more. Currently, we consider them an implementation detail.
  3. Moreover, we had the ability to select an implementation before and deprecated that interface since the new one was introduced.

To sum up, I'd suggest we use our C++ benchmarks to learn the heuristics and then plug in the heuristics directly into include/raft/matrix/detail/select_k.cuh.
@cjnolet , your thoughts on this?

@benfred
Copy link
Member Author

benfred commented Apr 26, 2023

To sum up, I'd suggest we use our C++ benchmarks to learn the heuristics and then plug in the heuristics directly into include/raft/matrix/detail/select_k.cuh. @cjnolet , your thoughts on this?

I think that sounds like a good plan - going to close this PR

@benfred benfred closed this Apr 26, 2023
@cjnolet
Copy link
Member

cjnolet commented Apr 26, 2023

@cjnolet , your thoughts on this

I like the idea of learning the heuristics and I like the idea of using our c++ benchmarks to run over the grid search, but I think it's extremely useful to consider pulling the results of the grid search into python so we can actually build a model (like the DT/RF model Ben has been playing with).

I've already expressed my concern with using an enum to specify the algo, mostly because it doesn't gel with the way we have done this for other APIs in raft, which would just expose new functions for each variant. Just to be clear- I still think there's benefit to exposing the different k-select variants individually, but at this point that could be a conversation for a different day (unless, of course, anyone else sees a dire need for this in the near term).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python
Projects
Development

Successfully merging this pull request may close these issues.

3 participants