Skip to content

Commit

Permalink
Add MST optimization to guarantee the connectivity of CAGRA graphs (#237
Browse files Browse the repository at this point in the history
)

This PR allows us to guarantee the connectivity of the CAGRA search graph using approximate MST.

It has been empirically shown that the graph indexes generated by CAGRA for search provide comparable search accuracy to other libraries, but reachability from any node to all nodes is not guaranteed. In fact, it has been confirmed that the number of strongly connected components (SCC) of graph indexes created by CAGRA is not 1 in some 100M scale datasets.

This problem can be alleviated by increasing the number of degrees in the search graph, but this would increase the size of the graph index. It is desirable to address this problem without increasing the number of degrees of the search graph.

Prior study has shown that this can be solved by using a Minimum Spanning Tree (MST)-like approach, but in general, MST calculation takes a long time. However, what is needed here is not an exact MST, but, for example, an approximate MST in which the total number of edges is not necessarily minimum. Such an approximate MST could be computed quickly on GPUs.

This PR contains implementation to create a approximate MST on the GPU at high speed based on the above policy and use it to guarantee the connectivity of the search graph.

This functionality is not always required, so it is considered an opt-in feature. A member variable named `guarantee_connectivity` is added to `index_params`, so set this variable to `true` if you wish to use this featgure.

> cuvs::neighbors::cagra::index_params index_params;
> index_params.guarantee_connectivity = true;
> auto index = cuvs::neighbors::cagra::build(res, index_params, dataset_view);

Authors:
  - Akira Naruse (https://github.com/anaruse)
  - Tamas Bela Feher (https://github.com/tfeher)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #237
  • Loading branch information
anaruse authored Jul 31, 2024
1 parent 812fffd commit e67caa5
Show file tree
Hide file tree
Showing 4 changed files with 858 additions and 31 deletions.
5 changes: 5 additions & 0 deletions cpp/include/cuvs/neighbors/cagra.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,11 @@ struct index_params : cuvs::neighbors::index_params {
graph_build_params::ivf_pq_params,
graph_build_params::nn_descent_params>
graph_build_params;

/**
* Whether to use MST optimization to guarantee graph connectivity.
*/
bool guarantee_connectivity = false;
/**
* Whether to add the dataset content to the index, i.e.:
*
Expand Down
5 changes: 3 additions & 2 deletions cpp/src/neighbors/cagra.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -238,9 +238,10 @@ template <
void optimize(
raft::resources const& res,
raft::mdspan<IdxT, raft::matrix_extent<int64_t>, raft::row_major, g_accessor> knn_graph,
raft::host_matrix_view<IdxT, int64_t, raft::row_major> new_graph)
raft::host_matrix_view<IdxT, int64_t, raft::row_major> new_graph,
const bool guarantee_connectivity = false)
{
detail::optimize(res, knn_graph, new_graph);
detail::optimize(res, knn_graph, new_graph, guarantee_connectivity);
}

template <typename T,
Expand Down
8 changes: 5 additions & 3 deletions cpp/src/neighbors/detail/cagra/cagra_build.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -382,7 +382,8 @@ template <
void optimize(
raft::resources const& res,
raft::mdspan<IdxT, raft::matrix_extent<int64_t>, raft::row_major, g_accessor> knn_graph,
raft::host_matrix_view<IdxT, int64_t, raft::row_major> new_graph)
raft::host_matrix_view<IdxT, int64_t, raft::row_major> new_graph,
const bool guarantee_connectivity = false)
{
using internal_IdxT = typename std::make_unsigned<IdxT>::type;

Expand All @@ -400,7 +401,8 @@ void optimize(
knn_graph.extent(0),
knn_graph.extent(1));

cagra::detail::graph::optimize(res, knn_graph_internal, new_graph_internal);
cagra::detail::graph::optimize(
res, knn_graph_internal, new_graph_internal, guarantee_connectivity);
}

template <typename T,
Expand Down Expand Up @@ -476,7 +478,7 @@ index<T, IdxT> build(
auto cagra_graph = raft::make_host_matrix<IdxT, int64_t>(dataset.extent(0), graph_degree);

RAFT_LOG_INFO("optimizing graph");
optimize<IdxT>(res, knn_graph->view(), cagra_graph.view());
optimize<IdxT>(res, knn_graph->view(), cagra_graph.view(), params.guarantee_connectivity);

// free intermediate graph before trying to create the index
knn_graph.reset();
Expand Down
Loading

0 comments on commit e67caa5

Please sign in to comment.