-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic linalg::map #1329
Generic linalg::map #1329
Conversation
Anecdotal evidence shows improved performance. I've tested it |
I've marked the PR as 'breaking'. However, the only breaking change is the public function |
I'm really happy to see more consolidation here! This is great! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really happy to have this change. Mostly I just have some minor comments on the docs but
I'd also like to get your ideas (and maybe gather some ideas from others) about the ordering of arguments in the public API.
linalg::detail::map<false>( | ||
stream, | ||
query_kths.data(), | ||
n_queries, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is an opportunity to expose our own "fill_n" function. Or would that just be too redundant to map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, we have it in matrix/init.cuh. However, here I use the detail
version because the input comes in rmm::uvector
. I cannot use the mdspan api here, because I resize it conditional on runtime arguments ("enable" the buffer only if it's used).
Perhaps, this would be a good place to use std::optional<mdarray>
, but it's hard to emplace
the optional when we mostly use helpers to create mdarrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To your last point- would std::make_optional
not work here? The mdarray
is also moveable as far as I know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it could work (didn't try yet), but may be clumsy to force specify all template parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: turned out not so bad, although it didn't reduce the number of lines.
What do you think about backporting std::optional::transform
from C++23 into raft? I miss this function so much for one-line constructs like this:
query_kths.transform([](auto x){return x.data_handle();}).value_or(std::nullptr);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the overloads. The changes LGTM!
/merge |
This reverts commit cb78f5f.
The change to the public API was siginficant enough here that it's going to require updates on the cuml side which invoke `map_k` with very different arguments (e.g. mdspan, different order). For now, it's best we revert this commmit to unblock cuml and then we can proceed by keeping the old APIs and deprecating them until we change cuml. Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Ben Frederickson (https://github.com/benfred) URL: #1336
Update the implementation behind `raft::linalg::map` and `raft::linalg::map_offset` to allow multiple inputs and optional index. Originally, this is a part of the effort to reduce the latency of ivf-pq search. The new implementation replaces several helpers, which have been using thrust; at the moment, raft uses a thrust policy that occasionally inserts extra `cudaStreamSynchronize`, and this negatively affects the latency on small inputs. The new implementation is generic enough to replace many raft's utility functions. It uses vectorized load/stores if possible, which improves performance. This is the second take on the PR #1329 that keeps the deprecated `map_k` function, which is used in cuml. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1337
Update the implementation behind `raft::linalg::map` and `raft::linalg::map_offset` to allow multiple inputs and optional index. Originally, this is a part of the effort to reduce the latency of ivf-pq search. The new implementation replaces several helpers, which have been using thrust; at the moment, raft uses a thrust policy that occasionally inserts extra `cudaStreamSynchronize`, and this negatively affects the latency on small inputs. The new implementation is generic enough to replace many raft's utility functions. It uses vectorized load/stores if possible, which improves performance. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1329
The change to the public API was siginficant enough here that it's going to require updates on the cuml side which invoke `map_k` with very different arguments (e.g. mdspan, different order). For now, it's best we revert this commmit to unblock cuml and then we can proceed by keeping the old APIs and deprecating them until we change cuml. Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Ben Frederickson (https://github.com/benfred) URL: rapidsai#1336
Update the implementation behind `raft::linalg::map` and `raft::linalg::map_offset` to allow multiple inputs and optional index. Originally, this is a part of the effort to reduce the latency of ivf-pq search. The new implementation replaces several helpers, which have been using thrust; at the moment, raft uses a thrust policy that occasionally inserts extra `cudaStreamSynchronize`, and this negatively affects the latency on small inputs. The new implementation is generic enough to replace many raft's utility functions. It uses vectorized load/stores if possible, which improves performance. This is the second take on the PR rapidsai#1329 that keeps the deprecated `map_k` function, which is used in cuml. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1337
Update the implementation behind
raft::linalg::map
andraft::linalg::map_offset
to allow multiple inputs and optional index.Originally, this is a part of the effort to reduce the latency of ivf-pq search. The new implementation replaces several helpers, which have been using thrust; at the moment, raft uses a thrust policy that occasionally inserts extra
cudaStreamSynchronize
, and this negatively affects the latency on small inputs.The new implementation is generic enough to replace many raft's utility functions. It uses vectorized load/stores if possible, which improves performance.