Add AIR-Top-k reference #2031

tfeher · 2023-11-30T10:29:42Z

Add reference to AIR top-k paper.

tfeher · 2023-11-30T10:31:44Z

@cjnolet and @benfred Currently radix top-k is only available in the detail namespace. Shall we add an option to the public API to use it directly? If yes, shall this be an "algo" option for select_k, or a separate function?

benfred

I think this is a great start - but I think we should add a bit more detail here.

We have a list of papers in the references section of the README : https://github.com/rapidsai/raft#references - I think we should add a link to this paper there too.

I also think we should add this reference to the raft::matrix::select_k docstring , since thats the user facing documentation, and I'm not sure that too many people will notice this in the detail namespace. When adding there, it would also be nice to make clear that 'Air Top-K' is the detail::select::radix::select_k code, and 'GridSelect' from the paper is the detail::select::warpsort code.

benfred · 2023-11-30T18:36:14Z

@cjnolet and @benfred Currently radix top-k is only available in the detail namespace. Shall we add an option to the public API to use it directly? If yes, shall this be an "algo" option for select_k, or a separate function?

Whats the customer use case for calling radix select directly? If there is a need for this, I think we can expose in the public api - but I think most people should be be using the existing public api that switches between the different selection algorithms based off the input sizes.

If we do decide to expose this publicly - I think we should add a 'algo' enum with a default 'auto' option (like if the 'auto' algo option is set, we apply the existing heuristic function to pick the algo for you).

cjnolet · 2023-11-30T18:42:30Z

I agree with @benfred on all points, except we shouldn't be pointing users to code in the detail namespace. Nothing in detail is part of the public APIs, which means anyting in that namespace can change without warning (even be moved or removed).

I do like the idea of providing a way to access these APIs, though. The interested user could dive deeper if they would like, but the reference and documentation would still remain at the public API level. Between the options of enum and separate public API functions, I think I prefer the separate functions for consistency, mostly because we've had similar discussions in other APIs and decided on the latter in the past (see lstsq_svd_jacobi vs lstsq_svd_qr in raft::linalg for an example. I should make clear, though, that I'm not completely against the enum.

We had briefly discussed this a few weeks ago, but left with the idea that if we expose options to choose individual k-selection routines, whether through individual functions or an enum, we should probably add documentation that explains why one each algorithm might be chosen over the others.

cjnolet · 2023-11-30T18:57:44Z

Whats the customer use case for calling radix select directly?

One thing i'd like to point out is that the lack of separation in these APIs has caused a lot of code, like testing code, for example, to invoke detail APIs directly, and it's generally preferred that we test code paths end-to-end through public APIs wherever possible. Another example is specific algorithms in the raft::neighbors namespace invoking routines in the detail namespace from raft::matrix. This leads to call patterns that are very hard to maintain across namespace boundaries. We should be able to change code in the detail namespace and only need to update the APIs within the immediate namespace. Things become messy and not maintainable when we have to update the entire codebase (including all of our tests) because an API changed in the detail namespace- that defeats the purpose of hiding the implementation details to avoid API breakage.

For this reason, I'd argue that public APIs are necessary both for end-users of RAFT as well as for RAFT itself as a user across namespace boundaries. This points towards exposing the different k-selection routines. Actually, @benfred I'm starting to like the idea of doing an enum for this more and more.

benfred

lgtm! Thanks @tfeher @cjnolet !

cjnolet · 2024-01-16T19:10:13Z

/merge

Add AIR-Top-k reference

b1d628b

tfeher added documentation Improvements or additions to documentation non-breaking Non-breaking change labels Nov 30, 2023

tfeher requested a review from a team as a code owner November 30, 2023 10:29

github-actions bot added the cpp label Nov 30, 2023

tfeher added the improvement Improvement / enhancement to an existing function label Nov 30, 2023

benfred reviewed Nov 30, 2023

View reviewed changes

cjnolet assigned tfeher Dec 4, 2023

cjnolet added 4 commits January 9, 2024 18:23

Merge branch 'branch-24.02' into air_topk_ref

daf7d67

Merge branch 'branch-24.02' into air_topk_ref

459c487

Updates

639f906

Adding nn-descent references

e977206

benfred approved these changes Jan 12, 2024

View reviewed changes

Merge branch 'branch-24.02' into air_topk_ref

9a23f18

rapids-bot bot merged commit 1d9adab into rapidsai:branch-24.02 Jan 16, 2024
61 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AIR-Top-k reference #2031

Add AIR-Top-k reference #2031

tfeher commented Nov 30, 2023

tfeher commented Nov 30, 2023

benfred left a comment

benfred commented Nov 30, 2023

cjnolet commented Nov 30, 2023 •

edited

Loading

cjnolet commented Nov 30, 2023 •

edited

Loading

benfred left a comment

cjnolet commented Jan 16, 2024

Add AIR-Top-k reference #2031

Add AIR-Top-k reference #2031

Conversation

tfeher commented Nov 30, 2023

tfeher commented Nov 30, 2023

benfred left a comment

Choose a reason for hiding this comment

benfred commented Nov 30, 2023

cjnolet commented Nov 30, 2023 • edited Loading

cjnolet commented Nov 30, 2023 • edited Loading

benfred left a comment

Choose a reason for hiding this comment

cjnolet commented Jan 16, 2024

cjnolet commented Nov 30, 2023 •

edited

Loading

cjnolet commented Nov 30, 2023 •

edited

Loading