Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AIR-Top-k reference #2031

Merged
merged 6 commits into from
Jan 16, 2024
Merged

Conversation

tfeher
Copy link
Contributor

@tfeher tfeher commented Nov 30, 2023

Add reference to AIR top-k paper.

@tfeher tfeher added documentation Improvements or additions to documentation non-breaking Non-breaking change labels Nov 30, 2023
@tfeher tfeher requested a review from a team as a code owner November 30, 2023 10:29
@github-actions github-actions bot added the cpp label Nov 30, 2023
@tfeher
Copy link
Contributor Author

tfeher commented Nov 30, 2023

@cjnolet and @benfred Currently radix top-k is only available in the detail namespace. Shall we add an option to the public API to use it directly? If yes, shall this be an "algo" option for select_k, or a separate function?

@tfeher tfeher added the improvement Improvement / enhancement to an existing function label Nov 30, 2023
Copy link
Member

@benfred benfred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great start - but I think we should add a bit more detail here.

We have a list of papers in the references section of the README : https://github.com/rapidsai/raft#references - I think we should add a link to this paper there too.

I also think we should add this reference to the raft::matrix::select_k docstring , since thats the user facing documentation, and I'm not sure that too many people will notice this in the detail namespace. When adding there, it would also be nice to make clear that 'Air Top-K' is the detail::select::radix::select_k code, and 'GridSelect' from the paper is the detail::select::warpsort code.

@benfred
Copy link
Member

benfred commented Nov 30, 2023

@cjnolet and @benfred Currently radix top-k is only available in the detail namespace. Shall we add an option to the public API to use it directly? If yes, shall this be an "algo" option for select_k, or a separate function?

Whats the customer use case for calling radix select directly? If there is a need for this, I think we can expose in the public api - but I think most people should be be using the existing public api that switches between the different selection algorithms based off the input sizes.

If we do decide to expose this publicly - I think we should add a 'algo' enum with a default 'auto' option (like if the 'auto' algo option is set, we apply the existing heuristic function to pick the algo for you).

@cjnolet
Copy link
Member

cjnolet commented Nov 30, 2023

I agree with @benfred on all points, except we shouldn't be pointing users to code in the detail namespace. Nothing in detail is part of the public APIs, which means anyting in that namespace can change without warning (even be moved or removed).

I do like the idea of providing a way to access these APIs, though. The interested user could dive deeper if they would like, but the reference and documentation would still remain at the public API level. Between the options of enum and separate public API functions, I think I prefer the separate functions for consistency, mostly because we've had similar discussions in other APIs and decided on the latter in the past (see lstsq_svd_jacobi vs lstsq_svd_qr in raft::linalg for an example. I should make clear, though, that I'm not completely against the enum.

We had briefly discussed this a few weeks ago, but left with the idea that if we expose options to choose individual k-selection routines, whether through individual functions or an enum, we should probably add documentation that explains why one each algorithm might be chosen over the others.

@cjnolet
Copy link
Member

cjnolet commented Nov 30, 2023

Whats the customer use case for calling radix select directly?

One thing i'd like to point out is that the lack of separation in these APIs has caused a lot of code, like testing code, for example, to invoke detail APIs directly, and it's generally preferred that we test code paths end-to-end through public APIs wherever possible. Another example is specific algorithms in the raft::neighbors namespace invoking routines in the detail namespace from raft::matrix. This leads to call patterns that are very hard to maintain across namespace boundaries. We should be able to change code in the detail namespace and only need to update the APIs within the immediate namespace. Things become messy and not maintainable when we have to update the entire codebase (including all of our tests) because an API changed in the detail namespace- that defeats the purpose of hiding the implementation details to avoid API breakage.

For this reason, I'd argue that public APIs are necessary both for end-users of RAFT as well as for RAFT itself as a user across namespace boundaries. This points towards exposing the different k-selection routines. Actually, @benfred I'm starting to like the idea of doing an enum for this more and more.

Copy link
Member

@benfred benfred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Thanks @tfeher @cjnolet !

@cjnolet
Copy link
Member

cjnolet commented Jan 16, 2024

/merge

@rapids-bot rapids-bot bot merged commit 1d9adab into rapidsai:branch-24.02 Jan 16, 2024
61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp documentation Improvements or additions to documentation improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

3 participants