Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CAGRA-Q build (compression) #2213

Merged
merged 30 commits into from
Mar 18, 2024

Conversation

achirkin
Copy link
Contributor

@achirkin achirkin commented Mar 5, 2024

Add a cagra::compress function that implements CAGRA-Q (VQ + PQ) compression of a given dataset.
The result, compressed_dataset, is supposed to complement the CAGRA graph during cagra::search in place of a raw dataset.

Current state:

  • The code runs and produces a meaningful output (tested internally by running the original prototype search with the generated compressed dataset); the recall levels are approximately the same as with the prototype implementation.
  • No test coverage yet (need to coordinate with the search PR CAGRA-Q search #2206)
  • Full pq_bits support ([4,5,6,7,8] - same as in IVF-PQ)
  • Any pq_dim values are accepted, but the dataset is not padded and thus dim must be a multiple of pq_dim.
  • The codebook math type is hardcoded to half to match the prototype implementation for now. This could be a runtime (build) parameter as well.
  • All common input data types should work (uint8_t, int8_t, half, and float compile), but I tested only float.

@achirkin achirkin added feature request New feature or request non-breaking Non-breaking change labels Mar 5, 2024
@achirkin achirkin self-assigned this Mar 5, 2024
@github-actions github-actions bot added the cpp label Mar 5, 2024
@achirkin achirkin marked this pull request as ready for review March 6, 2024 07:53
@achirkin achirkin requested a review from a team as a code owner March 6, 2024 07:53
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @achirkin for this PR! Here is a first batch of my comments.

cpp/include/raft/neighbors/dataset.hpp Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/dataset.hpp Show resolved Hide resolved
cpp/include/raft/neighbors/dataset.hpp Show resolved Hide resolved
cpp/include/raft/neighbors/dataset.hpp Show resolved Hide resolved
cpp/include/raft/neighbors/detail/cagra/cagra_build.cuh Outdated Show resolved Hide resolved
@achirkin achirkin added breaking Breaking change and removed non-breaking Non-breaking change labels Mar 8, 2024
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks Artem for updating the PR, and also adding serialization methods. Overall it looks good, but I still have an issue with the index types.

cpp/include/raft/neighbors/dataset.hpp Show resolved Hide resolved
cpp/include/raft/neighbors/dataset.hpp Show resolved Hide resolved
@achirkin achirkin requested a review from tfeher March 11, 2024 19:17
achirkin added a commit to enp1s0/raft that referenced this pull request Mar 13, 2024
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Artem for integrating data compression into RAFT CAGRA! The PR looks good to me, just have two minor questions below.

Ideally we should go ahead and merge this, so that the follow up PR (#2206) is easier to review.

@achirkin achirkin requested a review from tfeher March 14, 2024 09:45
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Artem, the PR looks good to me!

@achirkin achirkin added non-breaking Non-breaking change and removed breaking Breaking change labels Mar 14, 2024
@achirkin
Copy link
Contributor Author

Latest update:

  1. Renamed dataset_view() back to dataset() thus making the PR non-breaking.
  2. Made a few other smaller renamings as per our discussion
  3. Marked dataset() deprecated and added an EXPERIMENTAL note on the new index parameter.

@achirkin achirkin requested a review from cjnolet March 14, 2024 18:30
cpp/include/raft/neighbors/cagra_types.hpp Show resolved Hide resolved
cpp/include/raft/neighbors/cagra_types.hpp Show resolved Hide resolved
cpp/include/raft/neighbors/dataset.hpp Show resolved Hide resolved
cpp/include/raft/neighbors/dataset.hpp Show resolved Hide resolved
cpp/include/raft/neighbors/dataset.hpp Show resolved Hide resolved
cpp/include/raft/neighbors/dataset.hpp Show resolved Hide resolved
cpp/include/raft/neighbors/dataset.hpp Show resolved Hide resolved
cpp/include/raft/neighbors/dataset.hpp Show resolved Hide resolved
@achirkin
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 32f6f40 into rapidsai:branch-24.04 Mar 18, 2024
71 checks passed
@tfeher tfeher mentioned this pull request Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

4 participants