Add support for nosync thrust exec policy #2293

abc99lr · 2024-05-06T21:43:52Z

Currently, all the thrust calls used in RAFT are sync calls. There is another RMM execution policy that support async (or nosync) thrust calls. Supporting async calls is important. For example, we need the kmeans predict to be async in order to achieve kernel/copy overlapping in IVF-Flat index build (#2106).

This PR

Add support for nosync thrust policy in raft::resource
~~Change the thrust calls in kmeans predict to nosync versions. This would enable us achieve memcpy and kernel overlapping in IVF-Flat index building.~~ Based on discussions, we should use raft::linalg::map instead. Will open another PR for this change.

copy-pr-bot · 2024-05-06T21:43:56Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

tfeher

Thanks Rui for fixing this, it is indeed important to switch to async policy to improve vector search build time. Overall it looks good!

@achirkin could you also have a look at the changes in resource handling?

achirkin

Thanks for the PR! I welcome introduction of the thrust_nosync_policy, it could be useful to us in theory. However, both places you changed in kmean_balanced.cuh could better be done with raft's own linalg::map.

cpp/include/raft/cluster/detail/kmeans_balanced.cuh

cjnolet · 2024-05-08T11:37:53Z

However, both places you changed in kmean_balanced.cuh could better be done with raft's own linalg::map

@abc99lr, I just want to give an additional +1 to @achirkin's comment here. In general, we prefer to reuse raft functions where at all possible across the codebase, even when the raft function itself might be a trivial wrapper around thrust, cub, or one of the math libs. The reason for this is that it centralizes these calls so that we can properly instrument, improve, or even fix bugs as they arise in a single place, rather than having to scrape through the codebase and fix them in many places.

The other reason for this is API consistency- over time, raft has improved to become quite a pleasant API experience by being able to compose larger algorithms out of a series of raft functions, which improves code readability.

abc99lr · 2024-05-08T15:59:15Z

Thanks for the comments. I'll change the thrust calls in kmeans_balanced::predict to raft::linalg::map in a separate PR. For this one, I'll only add the nosync policy and remove the changes in kmeans_balanced::predict. Sound good?

cjnolet · 2024-05-08T23:23:22Z

For this one, I'll only add the nosync policy and remove the changes in kmeans_balanced::predict. Sound good?

If we aren't going to be using this nosync policy, I'd like to avoid merging changes just for the sake of merging changes.

abc99lr · 2024-05-08T23:25:39Z

Closing.

cjnolet · 2024-05-08T23:26:50Z

Most functions in RAFT are assumed to be async, so I suspect we could probably scrape through all of the places we use the thrust_policy and replace them with nosync_policy. @abc99lr I'm not against merging this just for that reason alone, but would you mind creating an issue for the above so that we don't lose sight of it? Just want to avoid this getting stale and remaining unused for the non-foreseeable future.

abc99lr · 2024-05-08T23:27:47Z

If we aren't going to be using this nosync policy, I'd like to avoid merging changes just for the sake of merging changes.

Understand. But I do think RAFT should provide this functionality to developers and tell people to use nosync policy when could. Otherwise, there would be many unnecessary syncs introduced.

abc99lr · 2024-05-08T23:39:58Z

Talked with @cjnolet, we think it's worth to try using nosync_policy by default. Going to create another PR for that

Testing. Do not merge. Based on the discussions from #2293, it's a good idea to test if we could use nosync thrust calls by default. This PR changes the current `rmm::exec_policy` to its async version `rmm::exec_policy_nosync`. Authors: - Rui Lan (https://github.com/abc99lr) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #2302

abc99lr requested a review from a team as a code owner May 6, 2024 21:43

github-actions bot added the cpp label May 6, 2024

tfeher approved these changes May 7, 2024

View reviewed changes

tfeher added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Vector Search labels May 7, 2024

tfeher requested a review from achirkin May 7, 2024 21:04

achirkin requested changes May 8, 2024

View reviewed changes

cpp/include/raft/cluster/detail/kmeans_balanced.cuh Outdated Show resolved Hide resolved

cpp/include/raft/cluster/detail/kmeans_balanced.cuh Outdated Show resolved Hide resolved

abc99lr added 2 commits May 8, 2024 23:03

Add support for nosync thrust exec policy.

e26ca70

Revert back changes in kmeans balanced.

bfd0677

abc99lr force-pushed the support-nosync-thrust branch from c6520fa to bfd0677 Compare May 8, 2024 23:13

abc99lr changed the title ~~[REVIEW] Add support for nosync thrust exec policy. Use nosync thrust calls for kmeans_balanced predict~~ Add support for nosync thrust exec policy May 8, 2024

abc99lr requested a review from achirkin May 8, 2024 23:19

cjnolet assigned abc99lr May 8, 2024

abc99lr closed this May 8, 2024

abc99lr mentioned this pull request May 9, 2024

Make thrust nosync execution policy the default thrust policy #2302

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for nosync thrust exec policy #2293

Add support for nosync thrust exec policy #2293

abc99lr commented May 6, 2024 •

edited

Loading

copy-pr-bot bot commented May 6, 2024

tfeher left a comment

achirkin left a comment

cjnolet commented May 8, 2024

abc99lr commented May 8, 2024

cjnolet commented May 8, 2024

abc99lr commented May 8, 2024

cjnolet commented May 8, 2024

abc99lr commented May 8, 2024

abc99lr commented May 8, 2024

Add support for nosync thrust exec policy #2293

Add support for nosync thrust exec policy #2293

Conversation

abc99lr commented May 6, 2024 • edited Loading

copy-pr-bot bot commented May 6, 2024

tfeher left a comment

Choose a reason for hiding this comment

achirkin left a comment

Choose a reason for hiding this comment

cjnolet commented May 8, 2024

abc99lr commented May 8, 2024

cjnolet commented May 8, 2024

abc99lr commented May 8, 2024

cjnolet commented May 8, 2024

abc99lr commented May 8, 2024

abc99lr commented May 8, 2024

abc99lr commented May 6, 2024 •

edited

Loading