Improve parallelism of refine host #2059

anaruse · 2023-12-13T07:28:02Z

This PR addresses #2058 by changing the thread parallelism method.

In the first half of the refine process, the distance calculation is performed on all candidate vectors, i.e., the number of queries * the original top-k vectors. Since the distance calculations for each vector can be performed independently, this part is thread-parallelized assuming that maximum parallelism is the number of queries * original top-k. This means that even if the number of queries is 1, this part can be executed in thread parallel.

On the other hand, the second half of the refine process, the so-called top-k calculation, can be performed independently for each query, but it is difficult to thread parallelize the calculation for a given query, Therefore, this part is parallelized assuming the maximum parallelism is the number of queries, as in the current implementation.

copy-pr-bot · 2023-12-13T07:28:06Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

cpp/include/raft/neighbors/detail/refine_host-inl.hpp

Co-authored-by: Artem M. Chirkin <[email protected]>

cpp/include/raft/neighbors/detail/refine_host-inl.hpp

achirkin · 2023-12-13T08:10:49Z

/ok to test

…er of queries is small

tfeher · 2023-12-13T15:24:26Z

/ok to test

achirkin · 2023-12-14T08:50:28Z

/ok to test

achirkin

Thanks, apart from a small nitpick, LGTM.

achirkin · 2023-12-14T08:55:01Z

cpp/include/raft/neighbors/detail/refine_host-inl.hpp

  if (size_t(suggested_n_threads) > n_queries) { suggested_n_threads = n_queries; }



Nitpick: the check is not redundant thank to the new code

Suggested change

if (size_t(suggested_n_threads) > n_queries) { suggested_n_threads = n_queries; }

cjnolet · 2023-12-15T04:19:48Z

/ok to test

cjnolet · 2023-12-15T04:25:13Z

/merge

cjnolet · 2023-12-19T00:36:14Z

/ok to test

cjnolet · 2024-01-03T23:30:27Z

/ok to test

wphicks · 2024-01-04T18:13:37Z

/ok to test

cjnolet · 2024-01-06T19:29:14Z

@anaruse it looks like there's a minor compile error in this PR, but otherwise the changes are ready to be merged.

cjnolet · 2024-01-07T12:48:05Z

/ok to test

cjnolet · 2024-01-09T03:18:52Z

/ok to test

cjnolet · 2024-01-09T03:40:59Z

/ok to test

This PR addresses rapidsai#2058 by changing the thread parallelism method. In the first half of the `refine` process, the distance calculation is performed on all candidate vectors, i.e., the number of queries * the original top-k vectors. Since the distance calculations for each vector can be performed independently, this part is thread-parallelized assuming that maximum parallelism is the number of queries * original top-k. This means that even if the number of queries is 1, this part can be executed in thread parallel. On the other hand, the second half of the `refine` process, the so-called top-k calculation, can be performed independently for each query, but it is difficult to thread parallelize the calculation for a given query, Therefore, this part is parallelized assuming the maximum parallelism is the number of queries, as in the current implementation. Authors: - Akira Naruse (https://github.com/anaruse) - Corey J. Nolet (https://github.com/cjnolet) - William Hicks (https://github.com/wphicks) Approvers: - Artem M. Chirkin (https://github.com/achirkin) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#2059

anaruse added 3 commits December 13, 2023 15:47

improve parallelism of refine_host

3707e38

Merge branch 'branch-24.02' into improve_parallelism_of_refine_host

42f6617

Remove debug code

514beb8

anaruse requested a review from a team as a code owner December 13, 2023 07:28

github-actions bot added the cpp label Dec 13, 2023

achirkin reviewed Dec 13, 2023

View reviewed changes

cpp/include/raft/neighbors/detail/refine_host-inl.hpp Outdated Show resolved Hide resolved

Update cpp/include/raft/neighbors/detail/refine_host-inl.hpp

5421bef

Co-authored-by: Artem M. Chirkin <[email protected]>

achirkin reviewed Dec 13, 2023

View reviewed changes

cpp/include/raft/neighbors/detail/refine_host-inl.hpp Outdated Show resolved Hide resolved

achirkin added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Dec 13, 2023

anaruse added 3 commits December 13, 2023 18:21

Use fine-grained thread parallelism in refine_host only when the numb…

4982c98

…er of queries is small

Added consideration for large dimensionality of dataset

cfba110

Updated distance calculation part to 24.02 based implementation

080b221

cjnolet assigned anaruse Dec 13, 2023

anaruse and others added 2 commits December 14, 2023 17:20

Satisfy pre-commit

c9bfc9a

Merge branch 'branch-24.02' into improve_parallelism_of_refine_host

a439b44

achirkin approved these changes Dec 14, 2023

View reviewed changes

cjnolet approved these changes Dec 15, 2023

View reviewed changes

Satisfy spell checker

0e1b984

Merge branch 'branch-24.02' into improve_parallelism_of_refine_host

cab8c8b

anaruse and others added 2 commits December 20, 2023 18:13

Merge branch 'branch-24.02' into improve_parallelism_of_refine_host

b0a4a55

Merge branch 'branch-24.02' into improve_parallelism_of_refine_host

fda1a7e

Merge branch 'branch-24.02' into improve_parallelism_of_refine_host

a6752dd

Merge branch 'branch-24.02' into improve_parallelism_of_refine_host

9d9bf0d

Satisfy style-checker

0482306

Added a necessary header file to div_rounding_up_safe()

04b97d3

rapids-bot bot merged commit 3b88d17 into rapidsai:branch-24.02 Jan 9, 2024
61 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve parallelism of refine host #2059

Improve parallelism of refine host #2059

anaruse commented Dec 13, 2023

copy-pr-bot bot commented Dec 13, 2023

achirkin commented Dec 13, 2023

tfeher commented Dec 13, 2023

achirkin commented Dec 14, 2023

achirkin left a comment

achirkin Dec 14, 2023

cjnolet commented Dec 15, 2023

cjnolet commented Dec 15, 2023

cjnolet commented Dec 19, 2023

cjnolet commented Jan 3, 2024

wphicks commented Jan 4, 2024

cjnolet commented Jan 6, 2024

cjnolet commented Jan 7, 2024

cjnolet commented Jan 9, 2024

cjnolet commented Jan 9, 2024

		if (size_t(suggested_n_threads) > n_queries) { suggested_n_threads = n_queries; }

Improve parallelism of refine host #2059

Improve parallelism of refine host #2059

Conversation

anaruse commented Dec 13, 2023

copy-pr-bot bot commented Dec 13, 2023

achirkin commented Dec 13, 2023

tfeher commented Dec 13, 2023

achirkin commented Dec 14, 2023

achirkin left a comment

Choose a reason for hiding this comment

achirkin Dec 14, 2023

Choose a reason for hiding this comment

cjnolet commented Dec 15, 2023

cjnolet commented Dec 15, 2023

cjnolet commented Dec 19, 2023

cjnolet commented Jan 3, 2024

wphicks commented Jan 4, 2024

cjnolet commented Jan 6, 2024

cjnolet commented Jan 7, 2024

cjnolet commented Jan 9, 2024

cjnolet commented Jan 9, 2024