Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Separate index sorting from CAGRA graph pruning #1446

Closed
enp1s0 opened this issue Apr 21, 2023 · 1 comment · Fixed by #1471
Closed

[FEA] Separate index sorting from CAGRA graph pruning #1446

enp1s0 opened this issue Apr 21, 2023 · 1 comment · Fixed by #1471
Assignees
Labels
feature request New feature or request

Comments

@enp1s0
Copy link
Member

enp1s0 commented Apr 21, 2023

Hi, @tfeher

The current implementation of CAGRA index pruning always sorts the input index before performing the pruning.
However, it is unnecessary when CAGRA builds the input index because the CAGRA index-building function outputs a sorted index.
On the other hand, when the index is not built by CAGRA and not sorted, the sorting process is necessary to use CAGRA pruning.
Therefore, it would be useful to make sorting optional, depending on whether the index is built by CAGRA or not.

One of the solutions for this issue is to separate the sorting process from the CAGRA pruning function and create a separate sorting function.
This would involve modifying the current prune function as follows:

template <typename IdxT = uint32_t,
          typename g_accessor =
            host_device_accessor<std::experimental::default_accessor<IdxT>, memory_type::host>>
void prune(raft::device_resources const& res,
           mdspan<IdxT, matrix_extent<IdxT>, row_major, g_accessor> knn_graph, //[in]
           raft::host_matrix_view<IdxT, IdxT, row_major> new_graph); //[out]

template <typename DataT, typename IdxT, typename accessor>
void sort_knn_graph(raft::device_resources const& res,
                     mdspan<const DataT, matrix_extent<IdxT>, row_major, accessor> dataset, // [in]
                     raft::host_matrix_view<IdxT, IdxT, row_major> knn_graph); // [in, out]

An alternative solution is to make the dataset argument std::optional and sort the index only if the argument is provided.

I prefer the approach of separating the sorting function because it can reduce the time required for compiling sorting operations during compilation when the sorting function is not used in the source code.

If there are no issues with this approach, I will modify the implementation accordingly.

@enp1s0 enp1s0 added the feature request New feature or request label Apr 21, 2023
@tfeher
Copy link
Contributor

tfeher commented Apr 21, 2023

One of the solutions for this issue is to separate the sorting process from the CAGRA pruning function and create a separate sorting function

Thanks for creating the issue! This sounds fine to me.

Linking the tracker issue #1392

rapids-bot bot pushed a commit that referenced this issue May 10, 2023
…#1471)

# Changes

This PR separates the graph index sorting functionality from the CAGRA pruning function and creates a new function. (Related issue: #1446)

# Unit test

I have included a new unit test for the sorting function. The test utilizes a separate dataset from the one used in the CAGRA main test to avoid the effect of rounding errors during norm computation between two vectors in the dataset. More details are in the source code.
https://github.com/enp1s0/raft/blob/ea6c449c260895e9125a591a4848eed06f5b72c4/cpp/test/neighbors/ann_cagra.cuh#L93-L96

# Issue
Close #1446

Authors:
  - tsuki (https://github.com/enp1s0)
  - Tamas Bela Feher (https://github.com/tfeher)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)

URL: #1471
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants