Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAGRA pad dataset for 128bit vectorized load #1505

Merged
merged 5 commits into from
Jun 9, 2023

Conversation

tfeher
Copy link
Contributor

@tfeher tfeher commented May 10, 2023

This PR adds padding to the dataset (if necessary) to make reading any of its rows compatible with 128bit vectorized loads. This change also enables handling arbitrary number of input features (before this PR each row had to be at least 64bit aligned, which constrained the acceptable number of input features).

Fixes #1458.

With this change, it is sufficient to keep a single "load type" specialization for the search kernels, which shall cut the binary size by half (#1459).

@tfeher
Copy link
Contributor Author

tfeher commented May 10, 2023

Todo:

  • store information about padding (ld and dim param) as strided mdspan
  • use padding information in search kernels

@tfeher tfeher added enhancement New feature or request breaking Breaking change labels May 10, 2023
@tfeher
Copy link
Contributor Author

tfeher commented May 10, 2023

Marked as breaking change because the PR

  • removes the load_bit_length search parameter.
  • changes index.dataset() layout to layout_strided

@tfeher tfeher force-pushed the cagra_pad_dataset branch from 7e06a0c to fb17578 Compare May 15, 2023 07:57
@github-actions github-actions bot added the cpp label May 15, 2023
@tfeher tfeher marked this pull request as ready for review May 15, 2023 08:10
@tfeher tfeher requested a review from a team as a code owner May 15, 2023 08:10
@tfeher tfeher added improvement Improvement / enhancement to an existing function and removed enhancement New feature or request labels May 15, 2023
@tfeher
Copy link
Contributor Author

tfeher commented May 15, 2023

There is still a bug with the new tests, otherwise it is ready for review.

@tfeher tfeher requested a review from enp1s0 May 15, 2023 22:12
@enp1s0
Copy link
Member

enp1s0 commented May 16, 2023

The solution of the compilation error is:

https://github.com/tfeher/raft/blob/fb175783a6cb6b9d0e56f0c18bd84c269fce9bb1/cpp/include/raft/neighbors/detail/cagra/search_multi_cta.cuh#L231

231:      dataset_size,
+         dataset_ld,
232:      result_buffer_size,

Copy link
Member

@enp1s0 enp1s0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me.

One comment:
It would be better to support a padded dataset in the argument of cagra::sort_knn_graph. What do you think?

@tfeher tfeher force-pushed the cagra_pad_dataset branch from fb17578 to 21e2d08 Compare June 6, 2023 20:36
@tfeher tfeher requested review from a team as code owners June 6, 2023 20:36
@tfeher tfeher changed the base branch from branch-23.06 to branch-23.08 June 6, 2023 20:36
@tfeher tfeher force-pushed the cagra_pad_dataset branch from da1fb53 to 08f4dcb Compare June 6, 2023 20:59
@tfeher
Copy link
Contributor Author

tfeher commented Jun 6, 2023

I have fixed a bug in the serialization routine, and added more tests. @enp1s0 could you have a quick look again at the changes?

It would be better to support a padded dataset in the argument of cagra::sort_knn_graph. What do you think?

I think it is a good suggestion, and it might be also useful to allow the constructor of index to accept padded dataset (i.e. strided mdspan). I would prefer to make these changes in a follow up PR.

Copy link
Member

@enp1s0 enp1s0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @tfeher, for fixing it and adding tests. The code looks good to me.

@ajschmidt8 ajschmidt8 removed the request for review from a team June 8, 2023 16:29
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a couple questions and a suggestion for a future improvement overall. Since CAGRA is still experimental, we have some leeway for API changes.

cudaMemcpyDefault,
resource::get_cuda_stream(res)));
resource::sync_stream(res);
serialize_mdspan(res, os, host_dataset.view());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly a side question, but are we still planning to remove the dataset from the serialization in a future change?

resource::get_cuda_stream(res));
} else {
// copy with padding
RAFT_CUDA_TRY(cudaMemsetAsync(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is going to be a more common practice, I wonder if we should consider centralizing this somewhere eventually. Probably doesn't need to be done yet, or even in this PR, though.

@@ -569,7 +546,7 @@ struct search : public search_plan_impl<DATA_T, INDEX_T, DISTANCE_T> {
~search() {}

void operator()(raft::resources const& res,
raft::device_matrix_view<const DATA_T, INDEX_T, row_major> dataset,
raft::device_matrix_view<const DATA_T, INDEX_T, layout_stride> dataset,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be any benefit to using a padded layout here or having an overload for it in the public API just to simplify the conversion?

@cjnolet
Copy link
Member

cjnolet commented Jun 9, 2023

/merge

@rapids-bot rapids-bot bot merged commit 6ec78e9 into rapidsai:branch-23.08 Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking change cpp improvement Improvement / enhancement to an existing function
Projects
Development

Successfully merging this pull request may close these issues.

CAGRA support arbitrary dim (number of features)
3 participants