ivf-pq performance tweaks #926

achirkin · 2022-10-19T16:28:16Z

A few optimizations to the ivfpq_compute_similarity_kernel:

Overhauled the way shmem/L1 carveout is selected
Introduced the block size selection logic based on the shmem/L1 split, occupancy, and the estimated cluster probes co-residency
Ported a new warp-sort module (warp_sort_distributed)
Transposed pq_centers to make loads coalesced
Changed layout of pq_dataset to make loads coalesced and vectorized
Optimized the loops to minimize ALU load

1. Change the layout of pq_centers to facilitate coalesced access during search 2. Optimize the arithmetics in ivfpq_compute_score and ivfpq_compute_similarity_kernel

…imize ALU usage in the main kernel

…recated in rapidsai#926

…late param

… extra conversions when possible

tfeher

Hi Artem, here is my second batch of comments, focusing on the search part. Overall it looks good, I really appreciate the developer notes you have added to explain the changes.

cpp/include/raft/spatial/knn/detail/ivf_pq_search.cuh

…perf-tweaks

tfeher

This is my last batch of comments, focusing on the build part. Please improve the description about the code that maps the data into the new layout of pq_dataset, otherwise it looks good.

cpp/include/raft/spatial/knn/detail/ivf_pq_build.cuh

…recated in rapidsai#926

… source ints

tfeher

Thansk Artem for addressing the issues! The PR looks good to me!

…pointers

achirkin · 2022-11-15T14:40:22Z

Note, I've updated the ivf_pq::build code to allow host-side input and training data. This is needed for training on datasets that do not fit into device memory.

tfeher

Had a look at the latest changes that add support for using dataset on the host. It would have been better to dedicate a separate PR for that, but otherwise it looks good to me.

cpp/include/raft/spatial/knn/detail/ivf_pq_build.cuh

…perf-tweaks

This PR introduces alignment into the cluster sizes and thus the total index size exceeds `n_rows`.

…exactly the size of a cluster

achirkin · 2022-11-17T10:43:09Z

Note, there was an actual bug in ivf_pq_search.cuh even before this PR, which was only exposed now due to the changed pq_data layout, and only in @tfeher's python tests. In the two commits since the last review, I've added extra test cases and checks for this and similar bugs (incorrect indices returned in the search post-processing step).

cjnolet

LGTM

cjnolet · 2022-11-17T16:27:01Z

@gpucibot merge

achirkin added 8 commits October 19, 2022 17:34

Improvements for ivfpq_compute_similarity_kernel

eceba57

1. Change the layout of pq_centers to facilitate coalesced access during search 2. Optimize the arithmetics in ivfpq_compute_score and ivfpq_compute_similarity_kernel

Remove debugging printf

3eb37b9

Fix integer overflow for large n_queries * max_samples

094a8d5

Fix bad top-k kernel selection

bb68b03

Limit max batch size to avoid extra large temporary buffers

33bc185

Change the layout of pq_centers for better memory utilization and opt…

9135a79

…imize ALU usage in the main kernel

Fix invalid scores being added when number of samples % 32 != 0

9cc1a92

Fix incorrect copying of pq_centers in PER_CLUSTER case

507158e

github-actions bot added the cpp label Oct 19, 2022

achirkin added 2 - In Progress Currenty a work in progress improvement Improvement / enhancement to an existing function non-breaking Non-breaking change cpp and removed cpp labels Oct 19, 2022

achirkin and others added 5 commits October 20, 2022 13:09

Filter topk results to minimize the number of warp-sorts

e75e77a

Implemented warp_sort_distributed

662124a

Add missing 'ballot' wrapper

17a75b2

Load pq_centers coalesced, but not vectorized

a30ad2b

Changed layout of the pq data

910ae33

achirkin added 3 - Ready for Review and removed 2 - In Progress Currenty a work in progress labels Oct 27, 2022

achirkin marked this pull request as ready for review October 27, 2022 06:20

achirkin requested a review from a team as a code owner October 27, 2022 06:20

cjnolet assigned achirkin Nov 1, 2022

Overhauled the launch configuration

57ae845

tfeher added a commit to tfeher/raft that referenced this pull request Nov 7, 2022

Remove preferred_thread_block_size search param, since it will be dep…

cd3ff15

…recated in rapidsai#926

achirkin added 4 commits November 9, 2022 12:20

Changed pq_dataset layout once again and promoted pq_bits to the temp…

0e4e631

…late param

Manually unroll the compute-score loop with templates (+8%-+16% qps)

c0e1c80

Use OutT instead of float as the compute-score output type to save on…

5e39679

… extra conversions when possible

set default shmem carveout to 1.0 to allow smaller block sizes

7814339

achirkin requested a review from tfeher November 10, 2022 17:08

tfeher requested changes Nov 11, 2022

View reviewed changes

Merge remote-tracking branch 'rapidsai/branch-22.12' into enh-ivf-pq-…

d8dcc43

…perf-tweaks

tfeher requested changes Nov 11, 2022

View reviewed changes

Address more review comments

a78b52b

achirkin mentioned this pull request Nov 11, 2022

[FEA] A helper for transposing an mdarray #1010

Open

Address one more set of comments

f8a0568

tfeher added a commit to tfeher/raft that referenced this pull request Nov 11, 2022

Remove preferred_thread_block_size search param, since it will be dep…

52feeac

…recated in rapidsai#926

achirkin added 3 commits November 14, 2022 10:41

Cosmetic improvements

598a84e

Simplify transpose_pq_centers code using mdspans

b690d18

Fix incorrect composing of 'code' from the bits on boundaries between…

1cb1aa9

… source ints

tfeher approved these changes Nov 14, 2022

View reviewed changes

achirkin added 2 commits November 15, 2022 15:30

Allow ivf_pq::build inputs be on the host (for large datasets)

44ae87a

Update the documentation: ivf_pq::build accepts both host and device …

ead6bff

…pointers

tfeher approved these changes Nov 15, 2022

View reviewed changes

cjnolet reviewed Nov 15, 2022

View reviewed changes

cpp/include/raft/spatial/knn/detail/ivf_pq_build.cuh Show resolved Hide resolved

achirkin added 2 commits November 16, 2022 07:29

Merge remote-tracking branch 'rapidsai/branch-22.12' into enh-ivf-pq-…

8665705

…perf-tweaks

Fix an assert that is no longer valid

d64799f

This PR introduces alignment into the cluster sizes and thus the total index size exceeds `n_rows`.

achirkin requested a review from a team as a code owner November 16, 2022 06:36

github-actions bot added the python label Nov 16, 2022

achirkin mentioned this pull request Nov 16, 2022

[ENH] IVF-* ANN post-integration TODOs #711

Open

11 tasks

achirkin added 2 commits November 17, 2022 09:59

Fix incorrect lookup of the DB record when the query result index is …

b28364d

…exactly the size of a cluster

Add extra checks for index invariants

56d6d5b

achirkin force-pushed the enh-ivf-pq-perf-tweaks branch from e80a345 to 56d6d5b Compare November 17, 2022 10:37

cjnolet approved these changes Nov 17, 2022

View reviewed changes

rapids-bot bot merged commit e06b156 into rapidsai:branch-22.12 Nov 17, 2022

achirkin deleted the enh-ivf-pq-perf-tweaks branch November 17, 2022 19:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ivf-pq performance tweaks #926

ivf-pq performance tweaks #926

achirkin commented Oct 19, 2022 •

edited

Loading

tfeher left a comment

tfeher left a comment

tfeher left a comment

achirkin commented Nov 15, 2022

tfeher left a comment

achirkin commented Nov 17, 2022

cjnolet left a comment

cjnolet commented Nov 17, 2022

ivf-pq performance tweaks #926

ivf-pq performance tweaks #926

Conversation

achirkin commented Oct 19, 2022 • edited Loading

tfeher left a comment

Choose a reason for hiding this comment

tfeher left a comment

Choose a reason for hiding this comment

tfeher left a comment

Choose a reason for hiding this comment

achirkin commented Nov 15, 2022

tfeher left a comment

Choose a reason for hiding this comment

achirkin commented Nov 17, 2022

cjnolet left a comment

Choose a reason for hiding this comment

cjnolet commented Nov 17, 2022

achirkin commented Oct 19, 2022 •

edited

Loading