Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic linalg::map #1329

Merged
merged 9 commits into from
Mar 13, 2023
Merged

Conversation

achirkin
Copy link
Contributor

@achirkin achirkin commented Mar 10, 2023

Update the implementation behind raft::linalg::map and raft::linalg::map_offset to allow multiple inputs and optional index.

Originally, this is a part of the effort to reduce the latency of ivf-pq search. The new implementation replaces several helpers, which have been using thrust; at the moment, raft uses a thrust policy that occasionally inserts extra cudaStreamSynchronize, and this negatively affects the latency on small inputs.

The new implementation is generic enough to replace many raft's utility functions. It uses vectorized load/stores if possible, which improves performance.

@achirkin achirkin added feature request New feature or request breaking Breaking change 2 - In Progress Currenty a work in progress labels Mar 10, 2023
@achirkin achirkin requested a review from a team as a code owner March 10, 2023 16:24
@github-actions github-actions bot added the cpp label Mar 10, 2023
@achirkin
Copy link
Contributor Author

Anecdotal evidence shows improved performance. I've tested it ivf_pq::search with small batch sizes, where the removal of unnecessary stream synchronization reduces the latency by up to 5-10% in some cases. However, the new primitive is going to be used everywhere across the codebase, so more benchmarking wouldn't hurt.

@achirkin
Copy link
Contributor Author

I've marked the PR as 'breaking'. However, the only breaking change is the public function raft::linalg::map; it was added rather recently and I haven't seen any uses of it yet.

@cjnolet
Copy link
Member

cjnolet commented Mar 10, 2023

I'm really happy to see more consolidation here! This is great!

@achirkin achirkin added 3 - Ready for Review and removed 2 - In Progress Currenty a work in progress labels Mar 10, 2023
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really happy to have this change. Mostly I just have some minor comments on the docs but

I'd also like to get your ideas (and maybe gather some ideas from others) about the ordering of arguments in the public API.

cpp/include/raft/linalg/map.cuh Show resolved Hide resolved
cpp/include/raft/linalg/map.cuh Show resolved Hide resolved
cpp/include/raft/linalg/map.cuh Outdated Show resolved Hide resolved
cpp/include/raft/linalg/map.cuh Show resolved Hide resolved
cpp/include/raft/linalg/map.cuh Show resolved Hide resolved
linalg::detail::map<false>(
stream,
query_kths.data(),
n_queries,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is an opportunity to expose our own "fill_n" function. Or would that just be too redundant to map?

Copy link
Contributor Author

@achirkin achirkin Mar 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, we have it in matrix/init.cuh. However, here I use the detail version because the input comes in rmm::uvector. I cannot use the mdspan api here, because I resize it conditional on runtime arguments ("enable" the buffer only if it's used).

Perhaps, this would be a good place to use std::optional<mdarray>, but it's hard to emplace the optional when we mostly use helpers to create mdarrays.

Copy link
Member

@cjnolet cjnolet Mar 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To your last point- would std::make_optional not work here? The mdarray is also moveable as far as I know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it could work (didn't try yet), but may be clumsy to force specify all template parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: turned out not so bad, although it didn't reduce the number of lines.

What do you think about backporting std::optional::transform from C++23 into raft? I miss this function so much for one-line constructs like this:

query_kths.transform([](auto x){return x.data_handle();}).value_or(std::nullptr);

@achirkin achirkin requested a review from cjnolet March 13, 2023 12:41
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the overloads. The changes LGTM!

@cjnolet cjnolet mentioned this pull request Mar 13, 2023
@cjnolet
Copy link
Member

cjnolet commented Mar 13, 2023

/merge

@rapids-bot rapids-bot bot merged commit cb78f5f into rapidsai:branch-23.04 Mar 13, 2023
cjnolet added a commit to cjnolet/raft that referenced this pull request Mar 13, 2023
rapids-bot bot pushed a commit that referenced this pull request Mar 14, 2023
The change to the public API was siginficant enough here that it's going to require updates on the cuml side which invoke `map_k` with very different arguments (e.g. mdspan, different order). For now, it's best we revert this commmit to unblock cuml and then we can proceed by keeping the old APIs and deprecating them until we change cuml.

Authors:
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Ben Frederickson (https://github.com/benfred)

URL: #1336
@achirkin achirkin mentioned this pull request Mar 14, 2023
rapids-bot bot pushed a commit that referenced this pull request Mar 14, 2023
Update the implementation behind `raft::linalg::map` and `raft::linalg::map_offset` to allow multiple inputs and optional index.

Originally, this is a part of the effort to reduce the latency of ivf-pq search. The new implementation replaces several helpers, which have been using thrust; at the moment, raft uses a thrust policy that occasionally inserts extra `cudaStreamSynchronize`, and this negatively affects the latency on small inputs.

The new implementation is generic enough to replace many raft's utility functions. It uses vectorized load/stores if possible, which improves performance.

This is the second take on the PR #1329 that keeps the deprecated `map_k` function, which is used in cuml.

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1337
lowener pushed a commit to lowener/raft that referenced this pull request Mar 15, 2023
Update the implementation behind `raft::linalg::map` and `raft::linalg::map_offset` to allow multiple inputs and optional index.

Originally, this is a part of the effort to reduce the latency of ivf-pq search. The new implementation replaces several helpers, which have been using thrust; at the moment, raft uses a thrust policy that occasionally inserts extra `cudaStreamSynchronize`, and this negatively affects the latency on small inputs.

The new implementation is generic enough to replace many raft's utility functions. It uses vectorized load/stores if possible, which improves performance.

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#1329
lowener pushed a commit to lowener/raft that referenced this pull request Mar 15, 2023
The change to the public API was siginficant enough here that it's going to require updates on the cuml side which invoke `map_k` with very different arguments (e.g. mdspan, different order). For now, it's best we revert this commmit to unblock cuml and then we can proceed by keeping the old APIs and deprecating them until we change cuml.

Authors:
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Ben Frederickson (https://github.com/benfred)

URL: rapidsai#1336
lowener pushed a commit to lowener/raft that referenced this pull request Mar 15, 2023
Update the implementation behind `raft::linalg::map` and `raft::linalg::map_offset` to allow multiple inputs and optional index.

Originally, this is a part of the effort to reduce the latency of ivf-pq search. The new implementation replaces several helpers, which have been using thrust; at the moment, raft uses a thrust policy that occasionally inserts extra `cudaStreamSynchronize`, and this negatively affects the latency on small inputs.

The new implementation is generic enough to replace many raft's utility functions. It uses vectorized load/stores if possible, which improves performance.

This is the second take on the PR rapidsai#1329 that keeps the deprecated `map_k` function, which is used in cuml.

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#1337
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review breaking Breaking change cpp feature request New feature or request
Projects
Development

Successfully merging this pull request may close these issues.

2 participants