Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add MST optimization to guarantee the connectivity of CAGRA graphs (#237
) This PR allows us to guarantee the connectivity of the CAGRA search graph using approximate MST. It has been empirically shown that the graph indexes generated by CAGRA for search provide comparable search accuracy to other libraries, but reachability from any node to all nodes is not guaranteed. In fact, it has been confirmed that the number of strongly connected components (SCC) of graph indexes created by CAGRA is not 1 in some 100M scale datasets. This problem can be alleviated by increasing the number of degrees in the search graph, but this would increase the size of the graph index. It is desirable to address this problem without increasing the number of degrees of the search graph. Prior study has shown that this can be solved by using a Minimum Spanning Tree (MST)-like approach, but in general, MST calculation takes a long time. However, what is needed here is not an exact MST, but, for example, an approximate MST in which the total number of edges is not necessarily minimum. Such an approximate MST could be computed quickly on GPUs. This PR contains implementation to create a approximate MST on the GPU at high speed based on the above policy and use it to guarantee the connectivity of the search graph. This functionality is not always required, so it is considered an opt-in feature. A member variable named `guarantee_connectivity` is added to `index_params`, so set this variable to `true` if you wish to use this featgure. > cuvs::neighbors::cagra::index_params index_params; > index_params.guarantee_connectivity = true; > auto index = cuvs::neighbors::cagra::build(res, index_params, dataset_view); Authors: - Akira Naruse (https://github.com/anaruse) - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: #237
- Loading branch information