[DOC] Benchmark matrix across RAFT ANN algorithms #1727

cjnolet · 2023-08-08T17:58:18Z

We've been asked by several folks for a matrix of benchmarks across different algorithms across different scales of datasets. We should just go ahead and do one which we publish on RAFT's documentation (preferably using RAFT's bench-ann Python scripts to make them consistently reproducible).

I would propose the following matrix:

Batch Size: 1, 100, 10k
Scales: 100k, 10M, 100M
K: 1, 10, 1k
Hardware: T4, V100, A100, H100 (potentially L4, A10)
Algorithms: Brute-force (for smaller scale), IVF-PQ, IVF-Flat, CAGRA

Datasets:

For the various data scales, I would propose using MS-turing-1B, which includes a 10M subsampled version in big-ann-benchmarks '23 (https://big-ann-benchmarks.com/neurips23.html) and we can use 100M subset from the whole thing in our bench-ann suite from big-ann-benchmarks '21 (https://big-ann-benchmarks.com/neurips21.html).
We are going to want to find a 100M scale dataset with higher dimensions (~512 would be best).

achirkin · 2023-08-09T06:10:22Z

I would also propose couple more variables to the mix to more easily spot the problems like #1726:

top-k value k: 1, 10, 20, 50, 100, 200, 500, 1000
If we end up using a few different datasets, make sure we have at least one of each data type we support (int8/uint8/float for now)

achirkin · 2023-08-09T06:58:50Z

Also I think this is a good case to advertise my gbench PR #1661, which should make writing the configs much easier and speed-up the benchmarks by orders of magnitude.

MarkMoTrin · 2023-08-09T17:45:30Z

Would be good to add the L40S as well

cjnolet · 2023-08-09T18:17:49Z

@achirkin, if we can prioritize getting the gbench changes so the outputs function like the existing benchmarks (the benchmarks dump to files), then I'm definitely on board w/ getting those merged in. I'd like to avoid breaking the Python scripts in the meantime now that we're pointing users to them.

cjnolet added the doc Documentation label Aug 8, 2023

cjnolet added this to VS/ML/DM Primitives Release Board Aug 8, 2023

achirkin mentioned this issue Aug 10, 2023

ANN-benchmarks: switch to use gbench #1661

Merged

cjnolet self-assigned this Aug 16, 2023

cjnolet added the Vector Search label Aug 22, 2023

cjnolet moved this to Todo in VS/ML/DM Primitives Release Board Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOC] Benchmark matrix across RAFT ANN algorithms #1727

[DOC] Benchmark matrix across RAFT ANN algorithms #1727

cjnolet commented Aug 8, 2023 •

edited

Loading

achirkin commented Aug 9, 2023

achirkin commented Aug 9, 2023

MarkMoTrin commented Aug 9, 2023

cjnolet commented Aug 9, 2023

[DOC] Benchmark matrix across RAFT ANN algorithms #1727

[DOC] Benchmark matrix across RAFT ANN algorithms #1727

Comments

cjnolet commented Aug 8, 2023 • edited Loading

achirkin commented Aug 9, 2023

achirkin commented Aug 9, 2023

MarkMoTrin commented Aug 9, 2023

cjnolet commented Aug 9, 2023

cjnolet commented Aug 8, 2023 •

edited

Loading