Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve parallelism of refine host (rapidsai#2059)
This PR addresses rapidsai#2058 by changing the thread parallelism method. In the first half of the `refine` process, the distance calculation is performed on all candidate vectors, i.e., the number of queries * the original top-k vectors. Since the distance calculations for each vector can be performed independently, this part is thread-parallelized assuming that maximum parallelism is the number of queries * original top-k. This means that even if the number of queries is 1, this part can be executed in thread parallel. On the other hand, the second half of the `refine` process, the so-called top-k calculation, can be performed independently for each query, but it is difficult to thread parallelize the calculation for a given query, Therefore, this part is parallelized assuming the maximum parallelism is the number of queries, as in the current implementation. Authors: - Akira Naruse (https://github.com/anaruse) - Corey J. Nolet (https://github.com/cjnolet) - William Hicks (https://github.com/wphicks) Approvers: - Artem M. Chirkin (https://github.com/achirkin) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#2059
- Loading branch information