Subsampling for IVF-PQ codebook generation #2052

abc99lr · 2023-12-08T22:45:14Z

This PR address #1901 by subsampling the input dataset for PQ codebook training to reduce the runtime.

Currently, a similar strategy is applied to per_cluster method, but not to the default per_subset method. This PR fixes this gap. Similar to the subsampling mechanism of the per_cluster method, we pick at minimum 256*max(pq_book_size, pq_dim) number of input rows for training each code book.

raft/cpp/include/raft/neighbors/detail/ivf_pq_build.cuh

Line 408 in cf4e03d

    
           size_t big_enough     = 256ul * std::max<size_t>(index.pq_book_size(), index.pq_dim());

The following performance numbers are generated using Deep-100M dataset. After subsampling, the search time and accuracy are not impacted (within +-5%) except one case where I saw 9% performance drop on search (using 10K batch for search). More extensive benchmarking across datasets seems to be needed for justification.

Dataset	n_iter	n_list	pq_bits	pq_dim	ratio	Original time (s)	Subsampling (s)	Speedup [subsampling]
Deep-100M	25	50000	4	96	10	129	89.5	1.44
Deep-100M	25	50000	5	96	10	128	89.4	1.43
Deep-100M	25	50000	6	96	10	131	90	1.46
Deep-100M	25	50000	7	96	10	129	91.1	1.42
Deep-100M	25	50000	8	96	10	149	93.4	1.60

Note, after subsampling, the PQ codebook generation is no longer a bottleneck in the IVF-PQ index building. More optimizations on PQ codebook generation seem unnecessary. Although we could in theory apply the custom kernel approach (#2050)
with subsampling, my early tests show the current GEMM approach performs better than the custom kernel after subsampling.

Using multiple stream could improve the performance further by overlapping kernels for different pq_dim, given kernels are small after subsampling and may not fully utilize GPU. However, as mention above, since the entire PQ codebook is fast, this optimization may not be worthwhile.

TODO

Benchmark the performance/accuracy impacts on multiple datasets

copy-pr-bot · 2023-12-08T22:45:17Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

tfeher · 2023-12-13T08:55:58Z

/ok to test

tfeher · 2023-12-13T08:59:22Z

Thanks Rui for the PR! @achirkin could you have look at the proposed subsampling step?

cpp/include/raft/neighbors/detail/ivf_pq_build.cuh

abc99lr · 2024-01-09T19:09:53Z

Tested the performance of this PR based on the first-level subsampling PR (#2077) on Deep-100M dataset with different build parameters. All the tests are done on A100-80GB-PCIe GPU.

pq_codebook_ratio controls the amount of sampling. The dataset fraction used for codebook training is 1/pq_codebook_ratio, similar to the definition of the current ratio variable that controls the amount of the first-level subsampling.

Here is the table for build performance. With the codebook subsampling, we can see about 30%-50% speedup, depending on the amount of subsampling user choose. Here, the 30%-50% speedup is achieved with using 10%-20% of input (after the initial subsampling) for codebook training.

iter	nlist	pq_bits	pq_codebook_ratio	pq_dim	ratio	GPU build time (s)
25	50k	5	1	96	10	130.548
25	50k	5	5	96	10	99.0929
25	50k	5	10	96	10	95.2036
25	50k	8	1	96	10	155.4
25	50k	8	5	96	10	107.101
25	50k	8	10	96	10	101.726
25	50k	5	1	64	10	132.206
25	50k	5	5	64	10	99.123
25	50k	5	10	64	10	95.1221
25	50k	8	1	64	10	141.418
25	50k	8	5	64	10	104.241
25	50k	8	10	64	10	100.206

The search performance are shown in the tables below. The maximum recall difference compared to no codebook subsampling is about 0.38%, which means a slightly recall increase with PQ codebook subsampling. This suggests it's more like a run-to-run variation. I am going to rerun the tests to eliminate the effect of run-to-run variation (going to update this PR afterwards). All the search results below are without refinement.

n_list=50K, pq_dim=96, pq_bits=5, n_iter=25

	recall	recall	recall	recall_diff vs no codebook subsampling	recall_diff vs no codebook subsampling	recall_diff vs no codebook subsampling	QPS	QPS	QPS
n_probe	pq_codebook_ratio=1	pq_codebook_ratio=5	pq_codebook_ratio=10	pq_codebook_ratio=1	pq_codebook_ratio=5	pq_codebook_ratio=10	pq_codebook_ratio=1	pq_codebook_ratio=5	pq_codebook_ratio=10
20	0.80905	0.80814	0.80889	0.00%	-0.11%	-0.02%	368557	367210	367712
30	0.84183	0.84217	0.84103	0.00%	0.04%	-0.10%	271789	269725	270331
40	0.85914	0.8602	0.85879	0.00%	0.12%	-0.04%	221708	220423	220543
50	0.86985	0.87108	0.86936	0.00%	0.14%	-0.06%	188799	188308	188017
100	0.89127	0.89194	0.89099	0.00%	0.08%	-0.03%	110754	110733	110534
200	0.90081	0.90054	0.89973	0.00%	-0.03%	-0.12%	63032.5	62878	62976.1
1000	0.90511	0.90504	0.90417	0.00%	-0.01%	-0.10%	16745.9	16686.8	16644.8
2000	0.90528	0.9052	0.90434	0.00%	-0.01%	-0.10%	9055.15	9013.77	9035.94
5000	0.90532	0.90525	0.90437	0.00%	-0.01%	-0.10%	3967.44	3950.08	3959.11
10000	0.90532	0.90525	0.90438	0.00%	-0.01%	-0.10%	2106.13	2100.26	2106.36

n_list=50K, pq_dim=64, pq_bits=5, n_iter=25

	recall	recall	recall	recall_diff vs no codebook subsampling	recall_diff vs no codebook subsampling	recall_diff vs no codebook subsampling	QPS	QPS	QPS
n_probe	pq_codebook_ratio=1	pq_codebook_ratio=5	pq_codebook_ratio=10	pq_codebook_ratio=1	pq_codebook_ratio=5	pq_codebook_ratio=10	pq_codebook_ratio=1	pq_codebook_ratio=5	pq_codebook_ratio=10
20	0.67986	0.68094	0.67951	0.00%	0.16%	-0.05%	454869	453663	453699
30	0.70131	0.70281	0.70159	0.00%	0.21%	0.04%	343085	342176	341031
40	0.71249	0.71401	0.71279	0.00%	0.21%	0.04%	283827	282973	283161
50	0.71891	0.72164	0.71963	0.00%	0.38%	0.10%	244791	244737	243534
100	0.73223	0.73455	0.73301	0.00%	0.32%	0.11%	146721	146617	146339
200	0.73783	0.73965	0.73801	0.00%	0.25%	0.02%	83077.8	83123.6	83018
1000	0.74076	0.74232	0.74088	0.00%	0.21%	0.02%	21814.6	21781.5	21796.6
2000	0.74085	0.74246	0.74092	0.00%	0.22%	0.01%	11644.9	11627.7	11612.7
5000	0.74087	0.74246	0.74093	0.00%	0.21%	0.01%	5041.84	5044.43	5023.59
10000	0.74088	0.74246	0.74093	0.00%	0.21%	0.01%	2664.48	2666.91	2656.13

n_list=50K, pq_dim=96, pq_bits=8, n_iter=25

	recall	recall	recall	recall_diff vs no codebook subsampling	recall_diff vs no codebook subsampling	recall_diff vs no codebook subsampling	QPS	QPS	QPS
n_probe	pq_codebook_ratio=1	pq_codebook_ratio=5	pq_codebook_ratio=10	pq_codebook_ratio=1	pq_codebook_ratio=5	pq_codebook_ratio=10	pq_codebook_ratio=1	pq_codebook_ratio=5	pq_codebook_ratio=10
20	0.84522	0.84603	0.84618	0.00%	0.10%	0.11%	149813	149708	149599
30	0.88647	0.8877	0.88793	0.00%	0.14%	0.16%	105652	105561	105556
40	0.90938	0.9106	0.91088	0.00%	0.13%	0.16%	82747.5	82683	82658.3
50	0.92431	0.9247	0.92551	0.00%	0.04%	0.13%	68229.1	68197.3	68164.9
100	0.95335	0.95436	0.95347	0.00%	0.11%	0.01%	36994.9	36963.2	36967.3
200	0.96703	0.96733	0.96637	0.00%	0.03%	-0.07%	19682.4	19673.2	19671.5
1000	0.97381	0.97389	0.97334	0.00%	0.01%	-0.05%	4520.13	4510.28	4514.8
2000	0.97397	0.97416	0.97366	0.00%	0.02%	-0.03%	2350.32	2352.48	2347.96
5000	0.97405	0.97423	0.97373	0.00%	0.02%	-0.03%	982.055	980.334	980.967
10000	0.97406	0.97423	0.97373	0.00%	0.02%	-0.03%	505.058	505.138	504.427

n_list=50K, pq_dim=64, pq_bits=8, n_iter=25

	recall	recall	recall	recall_diff vs no codebook subsampling	recall_diff vs no codebook subsampling	recall_diff vs no codebook subsampling	QPS	QPS	QPS
n_probe	pq_codebook_ratio=1	pq_codebook_ratio=5	pq_codebook_ratio=10	pq_codebook_ratio=1	pq_codebook_ratio=5	pq_codebook_ratio=10	pq_codebook_ratio=1	pq_codebook_ratio=5	pq_codebook_ratio=10
20	0.79791	0.79931	0.80015	0.00%	0.18%	0.28%	199479	199347	199316
30	0.83106	0.83291	0.83217	0.00%	0.22%	0.13%	146485	146500	146392
40	0.84826	0.8495	0.84924	0.00%	0.15%	0.12%	115829	115823	115772
50	0.85815	0.86038	0.85999	0.00%	0.26%	0.21%	96181.5	96139.9	96116.3
100	0.87928	0.88015	0.88063	0.00%	0.10%	0.15%	53281.1	53268.3	53269.6
200	0.88845	0.88935	0.88959	0.00%	0.10%	0.13%	28562.7	28556.1	28563.8
1000	0.89299	0.89405	0.89416	0.00%	0.12%	0.13%	6651.5	6650.96	6653.26
2000	0.89313	0.89415	0.89435	0.00%	0.11%	0.14%	3453.13	3463.49	3455.71
5000	0.89319	0.89416	0.89438	0.00%	0.11%	0.13%	1438.96	1436.86	1436.74
10000	0.89319	0.89416	0.89438	0.00%	0.11%	0.13%	735.696	735.859	735.35

tfeher · 2024-01-12T19:43:03Z

Thanks @abc99lr for the measurements! The additional subsampling for PQ codebooks gives a nice improvement in IVF-PQ build time, and I am excited about this change!

In many cases we see less than 0.05% diff in recall, and that looks perfect. But there are also other cases where we have larger than 0.1%, in those cases we would like to understand whether it is due to run-to-run variation. I am running additional test with PR #2077 and we will compare the diffs to that.

cpp/include/raft/neighbors/ivf_pq_types.hpp

abc99lr · 2024-01-13T00:47:31Z

Updates on run-to-run variance. I reran the code (both build and search) three times. And find even without this PR, the run-to-run variance is 0.37%. Please see the following tables for recall difference compared to the first run.

The tests below are with Deep-100M dataset, tested on A100-80GB-PCIe with n_list=50K and n_iter=25.

2nd run vs 1st run:

	pq_dim=96, pq_bits=5	pq_dim=96, pq_bits=5	pq_dim=96, pq_bits=5	pq_dim=64, pq_bits=5	pq_dim=64, pq_bits=5	pq_dim=64, pq_bits=5	pq_dim=96, pq_bits=8	pq_dim=96, pq_bits=8	pq_dim=96, pq_bits=8	pq_dim=96, pq_bits=5	pq_dim=96, pq_bits=5	pq_dim=96, pq_bits=5
n_probe	pq_codebook_ratio=1	pq_codebook_ratio=5	pq_codebook_ratio=10	pq_codebook_ratio=1	pq_codebook_ratio=5	pq_codebook_ratio=10	pq_codebook_ratio=1	pq_codebook_ratio=5	pq_codebook_ratio=10	pq_codebook_ratio=1	pq_codebook_ratio=5	pq_codebook_ratio=10
20	0.11%	0.13%	0.05%	-0.05%	-0.31%	-0.24%	0.17%	0.11%	-0.04%	0.02%	-0.06%	-0.02%
30	0.02%	-0.03%	0.22%	-0.16%	-0.30%	-0.39%	0.10%	0.06%	-0.01%	0.05%	-0.17%	0.05%
40	0.02%	0.03%	0.15%	-0.20%	-0.37%	-0.41%	0.11%	0.10%	-0.04%	0.01%	-0.05%	0.07%
50	0.02%	0.10%	0.18%	-0.23%	-0.47%	-0.47%	0.03%	0.05%	-0.05%	0.00%	-0.17%	0.09%
100	-0.12%	0.16%	0.11%	-0.29%	-0.45%	-0.49%	0.08%	-0.03%	-0.01%	-0.08%	-0.05%	0.03%
200	-0.13%	0.23%	0.14%	-0.30%	-0.33%	-0.42%	-0.02%	0.03%	0.07%	-0.06%	-0.09%	-0.06%
1000	-0.11%	0.25%	0.09%	-0.31%	-0.36%	-0.42%	-0.03%	0.04%	0.05%	-0.08%	-0.10%	-0.08%
2000	-0.11%	0.25%	0.09%	-0.30%	-0.36%	-0.41%	-0.03%	0.04%	0.05%	-0.08%	-0.09%	-0.08%
5000	-0.11%	0.25%	0.10%	-0.31%	-0.36%	-0.41%	-0.03%	0.04%	0.05%	-0.08%	-0.09%	-0.08%
10000	-0.11%	0.25%	0.09%	-0.31%	-0.36%	-0.41%	-0.03%	0.04%	0.04%	-0.08%	-0.09%	-0.08%

3rd run vs 1st run:

	pq_dim=96, pq_bits=5	pq_dim=96, pq_bits=5	pq_dim=96, pq_bits=5	pq_dim=64, pq_bits=5	pq_dim=64, pq_bits=5	pq_dim=64, pq_bits=5	pq_dim=96, pq_bits=8	pq_dim=96, pq_bits=8	pq_dim=96, pq_bits=8	pq_dim=96, pq_bits=5	pq_dim=96, pq_bits=5	pq_dim=96, pq_bits=5
n_probe	pq_codebook_ratio=1	pq_codebook_ratio=5	pq_codebook_ratio=10	pq_codebook_ratio=1	pq_codebook_ratio=5	pq_codebook_ratio=10	pq_codebook_ratio=1	pq_codebook_ratio=5	pq_codebook_ratio=10	pq_codebook_ratio=1	pq_codebook_ratio=5	pq_codebook_ratio=10
20	-0.31%	-0.08%	-0.09%	-0.10%	-0.27%	0.03%	0.14%	0.15%	0.16%	0.22%	-0.01%	-0.18%
30	-0.15%	-0.16%	0.00%	-0.14%	-0.32%	-0.06%	0.22%	0.00%	0.02%	0.14%	-0.17%	-0.08%
40	-0.12%	-0.13%	0.04%	-0.14%	-0.35%	-0.11%	0.23%	-0.05%	-0.04%	0.31%	0.01%	0.00%
50	-0.09%	-0.07%	0.10%	-0.03%	-0.41%	-0.14%	0.15%	0.02%	-0.11%	0.37%	-0.02%	-0.08%
100	-0.08%	-0.05%	0.02%	-0.14%	-0.51%	-0.26%	0.17%	-0.07%	0.02%	0.25%	0.11%	-0.12%
200	-0.16%	0.00%	0.06%	-0.17%	-0.44%	-0.20%	0.10%	-0.04%	0.03%	0.14%	0.12%	-0.20%
1000	-0.10%	-0.01%	0.08%	-0.17%	-0.45%	-0.22%	0.08%	-0.01%	0.04%	0.14%	0.10%	-0.22%
2000	-0.10%	-0.01%	0.07%	-0.17%	-0.46%	-0.21%	0.09%	-0.01%	0.04%	0.14%	0.10%	-0.21%
5000	-0.10%	-0.01%	0.07%	-0.17%	-0.45%	-0.21%	0.09%	-0.01%	0.04%	0.14%	0.10%	-0.20%
10000	-0.10%	-0.01%	0.07%	-0.17%	-0.45%	-0.21%	0.09%	-0.01%	0.04%	0.14%	0.10%	-0.20%

I think the 0.38% difference we saw with this PR is acceptable, if we can see similar run-to-run variance with #2077.

The results also show that the run-to-run variance is higher when pd_dim=64 and pq_bits=5, compared to other combinations.

tfeher

Thanks Rui for the update. The results look great. I have seen similar recall variations, and I think that looks good as well. Just a few small things.

cpp/bench/ann/src/raft/raft_ann_bench_param_parser.h

cpp/include/raft/neighbors/ivf_pq_types.hpp

python/pylibraft/pylibraft/neighbors/ivf_pq/ivf_pq.pyx

tfeher

Thanks Rui for the update! The PR looks good to me!

…-ivfpq-codebook

cjnolet · 2024-01-24T20:56:35Z

/ok to test

cjnolet · 2024-01-25T00:27:19Z

/ok to test

cjnolet · 2024-01-25T00:34:18Z

/ok to test

cjnolet · 2024-01-25T02:19:47Z

/merge

…-ivfpq-codebook

abc99lr · 2024-01-25T03:58:47Z

Hi @achirkin , I think the change you requested has been fixed. Could you please approve this PR?

Dismissing to get this in before code freeze. Rui has addressed the request.

cjnolet · 2024-01-25T04:02:21Z

/ok to test

cjnolet · 2024-01-25T04:03:28Z

/merge

achirkin · 2024-01-25T05:49:01Z

Sorry for being late, but yes, LGTM! :)

This reverts commit e272176.

Random sampling of training set for IVF methods was reverted in rapidsai/raft#2144 due to the large memory usage of the subsample method. Since then, PR rapidsai/raft#2155 has implemented a new random sampling method with improved memory utilization. Using that we can now enable random sampling of IVF methods (rapidsai/raft#2052 and rapidsai/raft#2077). Random subsampling has measurable overhead for IVF-Flat, therefore it is only enabled for IVF-PQ. Authors: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #122

Random sampling of training set for IVF methods was reverted in rapidsai/raft#2144 due to the large memory usage of the subsample method. Since then, PR rapidsai/raft#2155 has implemented a new random sampling method with improved memory utilization. Using that we can now enable random sampling of IVF methods (rapidsai/raft#2052 and rapidsai/raft#2077). Random subsampling has measurable overhead for IVF-Flat, therefore it is only enabled for IVF-PQ. Authors: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#122

abc99lr requested a review from a team as a code owner December 8, 2023 22:45

github-actions bot added the cpp label Dec 8, 2023

abc99lr mentioned this pull request Dec 8, 2023

[WIP] Custom fusedL2NN kernel for kmeans prediction #2050

Closed

2 tasks

tfeher added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Dec 13, 2023

achirkin reviewed Dec 13, 2023

View reviewed changes

cpp/include/raft/neighbors/detail/ivf_pq_build.cuh Outdated Show resolved Hide resolved

cjnolet assigned abc99lr Dec 14, 2023

tfeher mentioned this pull request Jan 3, 2024

Add random subsampling for IVF methods #2077

Merged

abc99lr requested review from a team as code owners January 5, 2024 23:42

github-actions bot added CMake python ci labels Jan 5, 2024

Add subsample support for PQ codebook generation. More benchmark needed.

00d1ece

abc99lr force-pushed the subsampling-ivfpq-codebook branch from ec0b2f6 to 00d1ece Compare January 6, 2024 01:10

github-actions bot removed CMake python ci labels Jan 6, 2024

Add knob to control the amount of PQ codebook training subsampling.

7332a27

github-actions bot added the python label Jan 6, 2024

tfeher reviewed Jan 12, 2024

View reviewed changes

cpp/include/raft/neighbors/ivf_pq_types.hpp Outdated Show resolved Hide resolved

Fix if-statement dependency issue in parse build parameter.

3153f0e

abc99lr changed the title ~~[WIP] Subsampling for per_subset method of IVF-PQ codebook generation~~ [REVIEW] Subsampling for per_subset method of IVF-PQ codebook generation Jan 13, 2024

Add max_train_points_per_pq_code row in benchmark tuning guide.

f480c13

tfeher requested changes Jan 24, 2024

View reviewed changes

cpp/bench/ann/src/raft/raft_ann_bench_param_parser.h Outdated Show resolved Hide resolved

cpp/include/raft/neighbors/ivf_pq_types.hpp Outdated Show resolved Hide resolved

python/pylibraft/pylibraft/neighbors/ivf_pq/ivf_pq.pyx Outdated Show resolved Hide resolved

Address comments.

c2b2715

abc99lr requested review from achirkin and tfeher January 24, 2024 19:19

tfeher approved these changes Jan 24, 2024

View reviewed changes

Merge remote-tracking branch 'upstream/branch-24.02' into subsampling…

9395be8

…-ivfpq-codebook

cjnolet approved these changes Jan 24, 2024

View reviewed changes

abc99lr added 2 commits January 24, 2024 13:15

Run format checker.

aa1a3e5

Fix formatting issue.

cc88715

More format changes.

67205f9

Merge remote-tracking branch 'upstream/branch-24.02' into subsampling…

4352489

…-ivfpq-codebook

cjnolet removed request for a team and achirkin January 25, 2024 04:03

rapids-bot bot merged commit e272176 into rapidsai:branch-24.02 Jan 25, 2024
61 checks passed

tfeher mentioned this pull request Jan 31, 2024

[BUG] Large memory overhead while subsampling vectors for IVF methods #2141

Closed

cjnolet added a commit to cjnolet/raft that referenced this pull request Jan 31, 2024

Revert "Subsampling for IVF-PQ codebook generation (rapidsai#2052)"

6224649

This reverts commit e272176.

tfeher mentioned this pull request Mar 14, 2024

Re enable IVF random sampling #2225

Closed

tfeher mentioned this pull request May 15, 2024

Enable random subsampling rapidsai/cuvs#122

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subsampling for IVF-PQ codebook generation #2052

Subsampling for IVF-PQ codebook generation #2052

abc99lr commented Dec 8, 2023 •

edited

Loading

copy-pr-bot bot commented Dec 8, 2023

tfeher commented Dec 13, 2023

tfeher commented Dec 13, 2023

abc99lr commented Jan 9, 2024

tfeher commented Jan 12, 2024

abc99lr commented Jan 13, 2024

tfeher left a comment

tfeher left a comment

cjnolet commented Jan 24, 2024

cjnolet commented Jan 25, 2024

cjnolet commented Jan 25, 2024

cjnolet commented Jan 25, 2024

abc99lr commented Jan 25, 2024

cjnolet commented Jan 25, 2024

cjnolet commented Jan 25, 2024

achirkin commented Jan 25, 2024

Subsampling for IVF-PQ codebook generation #2052

Subsampling for IVF-PQ codebook generation #2052

Conversation

abc99lr commented Dec 8, 2023 • edited Loading

copy-pr-bot bot commented Dec 8, 2023

tfeher commented Dec 13, 2023

tfeher commented Dec 13, 2023

abc99lr commented Jan 9, 2024

tfeher commented Jan 12, 2024

abc99lr commented Jan 13, 2024

tfeher left a comment

Choose a reason for hiding this comment

tfeher left a comment

Choose a reason for hiding this comment

cjnolet commented Jan 24, 2024

cjnolet commented Jan 25, 2024

cjnolet commented Jan 25, 2024

cjnolet commented Jan 25, 2024

abc99lr commented Jan 25, 2024

cjnolet commented Jan 25, 2024

cjnolet commented Jan 25, 2024

achirkin commented Jan 25, 2024

abc99lr commented Dec 8, 2023 •

edited

Loading