Add CAGRA-Q build (compression) #2213

achirkin · 2024-03-05T21:47:59Z

Add a cagra::compress function that implements CAGRA-Q (VQ + PQ) compression of a given dataset.
The result, compressed_dataset, is supposed to complement the CAGRA graph during cagra::search in place of a raw dataset.

Current state:

The code runs and produces a meaningful output (tested internally by running the original prototype search with the generated compressed dataset); the recall levels are approximately the same as with the prototype implementation.
No test coverage yet (need to coordinate with the search PR CAGRA-Q search #2206)
Full pq_bits support ([4,5,6,7,8] - same as in IVF-PQ)
Any pq_dim values are accepted, but the dataset is not padded and thus dim must be a multiple of pq_dim.
The codebook math type is hardcoded to half to match the prototype implementation for now. This could be a runtime (build) parameter as well.
All common input data types should work (uint8_t, int8_t, half, and float compile), but I tested only float.

tfeher

Thanks @achirkin for this PR! Here is a first batch of my comments.

cpp/include/raft/neighbors/dataset.hpp

cpp/include/raft/neighbors/detail/cagra/cagra_build.cuh

Co-authored-by: Tamas Bela Feher <[email protected]>

…set from making a view

tfeher

Many thanks Artem for updating the PR, and also adding serialization methods. Overall it looks good, but I still have an issue with the index types.

cpp/include/raft/neighbors/dataset.hpp

…re explicit about arguments

…aset.

…moved into detail namespace in rapidsai#2206

…ng search

…ccessible by the current device and document the api

tfeher

Thanks Artem for integrating data compression into RAFT CAGRA! The PR looks good to me, just have two minor questions below.

Ideally we should go ahead and merge this, so that the follow up PR (#2206) is easier to review.

cpp/include/raft/neighbors/cagra.cuh

cpp/include/raft/neighbors/detail/cagra/cagra_serialize.cuh

tfeher

Thanks Artem, the PR looks good to me!

cpp/include/raft/neighbors/dataset.hpp

achirkin · 2024-03-14T18:30:31Z

Latest update:

Renamed dataset_view() back to dataset() thus making the PR non-breaking.
Made a few other smaller renamings as per our discussion
Marked dataset() deprecated and added an EXPERIMENTAL note on the new index parameter.

cpp/include/raft/neighbors/cagra_types.hpp

cpp/include/raft/neighbors/dataset.hpp

achirkin · 2024-03-18T16:14:06Z

/merge

Add CAGRA-Q build (compression)

ac6b088

achirkin added feature request New feature or request non-breaking Non-breaking change labels Mar 5, 2024

achirkin self-assigned this Mar 5, 2024

Merge branch 'branch-24.04' into fea-cagra-q-build

fdbae63

github-actions bot added the cpp label Mar 5, 2024

achirkin and others added 2 commits March 6, 2024 07:39

Merge branch 'branch-24.04' into fea-cagra-q-build

aeb0daa

Formatting and style refactoring

8b9bee0

achirkin marked this pull request as ready for review March 6, 2024 07:53

achirkin requested a review from a team as a code owner March 6, 2024 07:53

achirkin added the 3 - Ready for Review label Mar 6, 2024

achirkin and others added 4 commits March 6, 2024 18:02

Merge branch 'branch-24.04' into fea-cagra-q-build

aa70b61

Integrate vpq_dataset into cagra

1a72020

Add dataset compression as an optional step during build

99fa02f

Merge branch 'branch-24.04' into fea-cagra-q-build

e1bd06b

tfeher requested changes Mar 7, 2024

View reviewed changes

achirkin and others added 4 commits March 8, 2024 06:34

Update cpp/include/raft/neighbors/dataset.hpp

833b50f

Co-authored-by: Tamas Bela Feher <[email protected]>

Add dataset serialization

53a5c14

Add comments regarding the internals of pq_bits/pq_width

02f2193

Fix incorrect stride assumption that prevented construct_strided_data…

34a7642

…set from making a view

achirkin added breaking Breaking change and removed non-breaking Non-breaking change labels Mar 8, 2024

tfeher requested changes Mar 11, 2024

View reviewed changes

cpp/include/raft/neighbors/dataset.hpp Show resolved Hide resolved

cpp/include/raft/neighbors/dataset.hpp Show resolved Hide resolved

achirkin and others added 6 commits March 11, 2024 14:05

Various small changes to the dataset type to improve safety and be mo…

3088703

…re explicit about arguments

Merge branch 'branch-24.04' into fea-cagra-q-build

999d343

Add a stub for the search function

4498a22

Switch to half as the vpq codebook type

dd1cc99

Simplify unique_ptr arithmetics

292406c

Fix deserialization: set the padding bytes to zero in the strided dat…

24ebae2

…aset.

achirkin added the Vector Search label Mar 11, 2024

achirkin requested a review from tfeher March 11, 2024 19:17

achirkin and others added 3 commits March 12, 2024 11:33

Further simplify deserialization code

cb11327

Merge branch 'branch-24.04' into fea-cagra-q-build

44aabc4

Remove the dynamic dispatch from public search function for it to be …

9a55874

…moved into detail namespace in rapidsai#2206

achirkin added a commit to enp1s0/raft that referenced this pull request Mar 13, 2024

Use the dataset type from rapidsai#2213 for the runtime dispatch duri…

125b1b0

…ng search

achirkin and others added 2 commits March 13, 2024 11:28

Make the construct_strided_dataset only copy the data when it's not a…

88566d6

…ccessible by the current device and document the api

Merge branch 'branch-24.04' into fea-cagra-q-build

8a3ae0d

tfeher reviewed Mar 13, 2024

View reviewed changes

cpp/include/raft/neighbors/cagra.cuh Show resolved Hide resolved

cpp/include/raft/neighbors/detail/cagra/cagra_serialize.cuh Show resolved Hide resolved

achirkin and others added 2 commits March 14, 2024 10:43

Merge branch 'branch-24.04' into fea-cagra-q-build

d1e9e3d

Bump serialization version

890b29e

achirkin requested a review from tfeher March 14, 2024 09:45

tfeher approved these changes Mar 14, 2024

View reviewed changes

cjnolet reviewed Mar 14, 2024

View reviewed changes

cpp/include/raft/neighbors/dataset.hpp Outdated Show resolved Hide resolved

Address offline and online review comments

66ae8ae

achirkin added non-breaking Non-breaking change and removed breaking Breaking change labels Mar 14, 2024

achirkin requested a review from cjnolet March 14, 2024 18:30

Merge branch 'branch-24.04' into fea-cagra-q-build

82f638d

divyegala reviewed Mar 15, 2024

View reviewed changes

achirkin added 3 commits March 15, 2024 20:32

Merge branch 'branch-24.04' into fea-cagra-q-build

dc7d761

Merge branch 'branch-24.04' into fea-cagra-q-build

54b99b9

Merge branch 'branch-24.04' into fea-cagra-q-build

0f2f63b

rapids-bot bot merged commit 32f6f40 into rapidsai:branch-24.04 Mar 18, 2024
71 checks passed

tfeher mentioned this pull request Mar 21, 2024

[FEA] CAGRA-Q #1889

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CAGRA-Q build (compression) #2213

Add CAGRA-Q build (compression) #2213

achirkin commented Mar 5, 2024 •

edited

Loading

tfeher left a comment

tfeher left a comment

tfeher left a comment

tfeher left a comment

achirkin commented Mar 14, 2024

achirkin commented Mar 18, 2024

Add CAGRA-Q build (compression) #2213

Add CAGRA-Q build (compression) #2213

Conversation

achirkin commented Mar 5, 2024 • edited Loading

Current state:

tfeher left a comment

Choose a reason for hiding this comment

tfeher left a comment

Choose a reason for hiding this comment

tfeher left a comment

Choose a reason for hiding this comment

tfeher left a comment

Choose a reason for hiding this comment

achirkin commented Mar 14, 2024

achirkin commented Mar 18, 2024

achirkin commented Mar 5, 2024 •

edited

Loading