Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forward-merge branch-24.08 into branch-24.10 #267

Merged
merged 1 commit into from
Jul 31, 2024
Merged

Conversation

rapids-bot[bot]
Copy link

@rapids-bot rapids-bot bot commented Jul 31, 2024

Forward-merge triggered by push to branch-24.08 that creates a PR to keep branch-24.10 up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge. See forward-merger docs for more info.

)

This PR allows us to guarantee the connectivity of the CAGRA search graph using approximate MST.

It has been empirically shown that the graph indexes generated by CAGRA for search provide comparable search accuracy to other libraries, but reachability from any node to all nodes is not guaranteed. In fact, it has been confirmed that the number of strongly connected components (SCC) of graph indexes created by CAGRA is not 1 in some 100M scale datasets.

This problem can be alleviated by increasing the number of degrees in the search graph, but this would increase the size of the graph index. It is desirable to address this problem without increasing the number of degrees of the search graph.

Prior study has shown that this can be solved by using a Minimum Spanning Tree (MST)-like approach, but in general, MST calculation takes a long time. However, what is needed here is not an exact MST, but, for example, an approximate MST in which the total number of edges is not necessarily minimum. Such an approximate MST could be computed quickly on GPUs.

This PR contains implementation to create a approximate MST on the GPU at high speed based on the above policy and use it to guarantee the connectivity of the search graph.

This functionality is not always required, so it is considered an opt-in feature. A member variable named `guarantee_connectivity` is added to `index_params`, so set this variable to `true` if you wish to use this featgure.

> cuvs::neighbors::cagra::index_params index_params;
> index_params.guarantee_connectivity = true;
> auto index = cuvs::neighbors::cagra::build(res, index_params, dataset_view);

Authors:
  - Akira Naruse (https://github.com/anaruse)
  - Tamas Bela Feher (https://github.com/tfeher)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #237
@rapids-bot rapids-bot bot requested a review from a team as a code owner July 31, 2024 15:10
@GPUtester GPUtester merged commit 047b262 into branch-24.10 Jul 31, 2024
1 check passed
@github-actions github-actions bot added the cpp label Jul 31, 2024
Copy link
Author

rapids-bot bot commented Jul 31, 2024

SUCCESS - forward-merge complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants