-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AIR-Top-k reference #2031
Add AIR-Top-k reference #2031
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great start - but I think we should add a bit more detail here.
We have a list of papers in the references section of the README : https://github.com/rapidsai/raft#references - I think we should add a link to this paper there too.
I also think we should add this reference to the raft::matrix::select_k
docstring , since thats the user facing documentation, and I'm not sure that too many people will notice this in the detail namespace. When adding there, it would also be nice to make clear that 'Air Top-K' is the detail::select::radix::select_k
code, and 'GridSelect' from the paper is the detail::select::warpsort
code.
Whats the customer use case for calling radix select directly? If there is a need for this, I think we can expose in the public api - but I think most people should be be using the existing public api that switches between the different selection algorithms based off the input sizes. If we do decide to expose this publicly - I think we should add a 'algo' enum with a default 'auto' option (like if the 'auto' algo option is set, we apply the existing heuristic function to pick the algo for you). |
I agree with @benfred on all points, except we shouldn't be pointing users to code in the I do like the idea of providing a way to access these APIs, though. The interested user could dive deeper if they would like, but the reference and documentation would still remain at the public API level. Between the options of enum and separate public API functions, I think I prefer the separate functions for consistency, mostly because we've had similar discussions in other APIs and decided on the latter in the past (see We had briefly discussed this a few weeks ago, but left with the idea that if we expose options to choose individual k-selection routines, whether through individual functions or an enum, we should probably add documentation that explains why one each algorithm might be chosen over the others. |
One thing i'd like to point out is that the lack of separation in these APIs has caused a lot of code, like testing code, for example, to invoke detail APIs directly, and it's generally preferred that we test code paths end-to-end through public APIs wherever possible. Another example is specific algorithms in the For this reason, I'd argue that public APIs are necessary both for end-users of RAFT as well as for RAFT itself as a user across namespace boundaries. This points towards exposing the different k-selection routines. Actually, @benfred I'm starting to like the idea of doing an enum for this more and more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/merge |
Add reference to AIR top-k paper.