-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialize only knn graph with CAGRA bench #1743
Serialize only knn graph with CAGRA bench #1743
Conversation
@@ -129,15 +135,81 @@ void RaftCagra<T, IdxT>::set_search_param(const AnnSearchParam& param) | |||
template <typename T, typename IdxT> | |||
void RaftCagra<T, IdxT>::save(const std::string& file) const | |||
{ | |||
raft::neighbors::cagra::serialize(handle_, file, *index_); | |||
// 1 orig serialization: save both dataset and knn graph into the file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benfred and I were literally just talking about this. We have also agreed in the past that we should be making serializing the dataset optional and I believe we also converged on providing a update_dataset()
method right on the index to alleviate the awkwardness of having to accept a dataset at deserialization time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we can just construct the index from the two arrays like here
https://github.com/rapidsai/raft/pull/1743/files#diff-5f08a0aeb75c8884f5142d218e97ec859de8cc0cffd98d8c08f0e18a45655da0R211-R212
Saving the knn-graph mdspan is trivial with the current helpers, but we could add the following code block as a helper to make it easy to load an mdspan:
https://github.com/rapidsai/raft/pull/1743/files#diff-5f08a0aeb75c8884f5142d218e97ec859de8cc0cffd98d8c08f0e18a45655da0R175-R181
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an alternative to rapidsai#1743, this uses the `include_dataset=False` param in cagra::serialize to avoid writing the dataset to disk with the index. This lets us avoid writing a second copy of the dataset, since it is available in a separate file already
As an alternative to #1743, this uses the `include_dataset=False` param in cagra::serialize to avoid writing the dataset to disk with the index. This lets us avoid writing a second copy of the dataset, since it is available in a separate file already Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1781
Alternative solution provided in #1781. Closing this. |
This PR changes how the CAGRA ANN benchmarks saves the index. We save the graph only, because the dataset is usually available in a separate file.