ANN-benchmarks: switch to use gbench #1661

achirkin · 2023-07-21T12:40:55Z

Make the ANN benchmarks use the same google benchmark infrastructure as the prim benchmarks while keeping the functional changes minimal.

Overview

The command-line API largely stays the same, but enhanced with gbench-specific parameters, such as using regex to select algo configs, control the minimum run-time, and flexible reporting to console/files.
There's just one executable ANN_BENCH, all of the algorithms are loaded as shared libraries. The CPU-only components do not require cuda at runtime (ANN_BENCH itself, hnswlib).
Some dependencies are linked statically, it's possible to just copy the executable and the libs and run the benchmark on a linux machine with very few packages installed.
Search benchmarks do not produce any output anymore, they use ground truth files to compute and report the recall in-place.
Search/build parameters visible in the config files are passed as benchmark counters/labels/context.
Extra functionality:
- --data_prefix to specify a custom path where the data sets are stored
- --index_prefix to specify a custom path where the index sets are stored
- --override_kv=<key:value1:value2:...:valueN> override one or more parameters of search/build for parameter-sweep benchmarks

Breaking change: the behavior of the ANN benchmark executables (library API is not touched). The executable CLI flags have changed, so the newer, adapted wrapper scripts won't work with the executables from the libraft-ann-bench-23.08 conda package.

A primer

./cpp/build/ANN_BENCH                         \ # benchmark executable
  --data_prefix=/datastore/my/local/data/path \ # override (prefix) path to local data
  --benchmark_min_warmup_time=0.001           \ # spend some minimal time warming up
  --benchmark_min_time=3s                     \ # run minimum 3 seconds on each case
  --benchmark_out=ivf_pq.csv                  \ # duplicate output to this file
  --benchmark_out_format=csv                  \ # the file output should be in CSV format
  --benchmark_counters_tabular                \ # the console output should be tabular
  --benchmark_filter="raft_ivf_pq\..*"        \ # use regex to filter benchmarks
  --search                                    \ # 'search' mode
  --override_kv=k:1:10:100:200:500            \ # Parameter-sweep over the top-k value
  --override_kv=n_queries:1:10:10000          \ #                  and the search batch size
  --override_kv=smemLutDtype:"fp8"            \ # Override a search parameter
  cpp/bench/ann/conf/bigann-100M.json           # specify the path to the config file

Motivation

Eliminate huge bug-prone configs

The current config fixes the batch size and k to one value per-config, so the whole config needs to be copied to try multiple values. In the PR, both these parameters can be overwritten in the search parameters and/or via command line (ANN_BENCH --override_kv=n_queries:1:100:1000 --override_kv=k:1:10:20:50:100:200:500:1000 would test all combinations in one go). Any of the build/search parameters can be overwritten at the same time.

Run the benchmarks and aggregate the data in the minimal environment

The new executable generates reports with QPS, Recall, and other metrics using gbench. Hence there's no need to copy back and forth dozens of result files and no need to install python environment for running or evaluating. A single CSV or JSON can be produced for all algorithms and run configurations per dataset+hardware pair.

Speedup the benchmarks

The current benchmark framework is extremely slow due to two factors:

The dataset and the index need to be loaded for every test case, this takes orders of magnitude longer than the search test itself for large datasets. In my tests, the preparation phase for bigann-1B took ten minutes and the search could take anywhere between a few seconds and a minute.
The benchmark always goes through the whole query dataset. That is, if the query set is 10K and the batch size is 1, the benchmark repeats 10K times (to produce the result file for evaluating the recall).

In the proposed solution, a user can set the desired time or number of iterations to run; the data is loaded only once and the index is cached between the search test cases. My subjective conservative estimate is the overall speedup of more than x100 for running a typical large-scale benchmark.

Better measurement of QPS

By default, the current benchmark reports the average execution time and does not warm-up iterations. As a result, the first test case on most of our plots is distorted (e.g. the first iteration of the first case takes about a second or two to run, and that significantly affects the average of the rest 999 ~100us iterations). gbench provides the --benchmark_min_warmup_time parameters to skip first one or few iterations, which solves the problem.

Extra context in the reports

The new benchmark executable uses gbench context to augment the report with some essential information: base and query set name, dimensionality, and size, distance metric, some CPU and GPU info, CUDA version. All this is appended directly to the generated CSV/JSON files, which makes the bookkeeping much easier.
In addition, a user may pass extra context via command line --benchmark_context=<key>=<value>; this could be e.g. the hostname, some ENV variables, etc.

Easier profiling

Thanks to flexible regex filtering and parameter overriding, now it's possible to specify a subset of cases and an exact number of times they should run. This makes the profiling using such tools as nsys and ncu much easier.

achirkin · 2023-07-21T12:42:36Z

CC @divyegala as I've heard you were planning to do some work on python-side benchmark scripts?

divyegala · 2023-07-21T18:58:23Z

@achirkin is recall/qps still reported as a CSV file, or just output to the terminal?

achirkin · 2023-07-21T19:06:09Z

It all goes via gbench; e.g. I often use the following command to print both into terminal tabulated and into a csv file:

./cpp/build/RAFT_IVF_PQ_ANN_BENCH \
  --benchmark_out=ivf_pq.csv --benchmark_out_format=csv \
  --benchmark_counters_tabular --search --benchmark_filter="raft_ivf_pq\..*" \
  cpp/bench/ann/conf/bigann-100M.json

To calculate recalls in this branch, you need to add the path to ground truth in the config, along with "base_file" and "query_file":

  "dataset": {
    "name": "sift-128-euclidean",
    "base_file": "data/sift-128-euclidean/base.fbin",
    "query_file": "data/sift-128-euclidean/query.fbin",
    "groundtruth_neighbors_file": "data/sift-128-euclidean/groundtruth.neighbors.ibin"
    "distance": "euclidean"
  },

Without this, the benchmark gives a warning that the recall is not available.

divyegala · 2023-07-21T21:44:48Z

@achirkin I would mark this as a breaking change. It changes default behavior, and changes how the executables work.

Search not producing output files - previously, Search produced output files and we would use scripts/eval.pl to produce a CSV output
The groundtruth file was an input to scripts/eval.pl - now it's part of the configuration file

Could you please also update the markdown file with these new instructions?

achirkin · 2023-07-22T05:34:07Z

Sure, though I thought the breaking/non-breaking referred to the library components that could be used downstream.

cjnolet · 2023-08-10T01:25:08Z

@achirkin good news is that it looks like the gbench executables allow you to specify the output file and output format as json or csv (see here) so I think you should be able to get the Python scripts loading those files pretty easily.

achirkin · 2023-08-10T08:30:41Z

Indeed they allow! In general, gbench CLI is rather flexible and, in conjunction with our extra CLI parameters, should cover all our needs both standalone and as a backend for the python scripts.
However, answering the question you raised in #1727, the new executable does not produce the binary search results data and associated .txt files with the search parameters and other information. Although I can add a CLI parameter to dump the binary data and replicate the .txt files, this would still require some adjustments on the python side.

I believe a better way forward is to not rush this PR and take time to adapt the python scripts. I assume most of the work should be just deleting parts of the code: no need to select the right executable (ANN_BENCH does dynamic dispatch), adjust/modify configs at runtime, evaluate the recall, parse the scattered model/search parameters. Just ask the ANN_BENCH to produce a single json and read all relevant information from there. May I ask, @divyegala, as the author of the python scripts, if you have some cycles to help with this? We could do the changes directly in this PR, or better in an immediate follow-up.

divyegala · 2023-08-11T20:51:03Z

@achirkin sure, I will help with updating the Python scripts.

Out of curiosity - I was going through your PR, and I couldn't find how --benchmark_filter works. Do you use it as a custom filter on index["algo"]? Can you point in your code where it happens?

achirkin · 2023-08-12T07:54:32Z

Thanks, @divyegala! All CLI arguments of the form --benchmark_xxx (including the --benchmark_filter) are handled by gbench directly. Under the hood, it uses regex to filter among the registered benchmarks. What I do in benchmark.hpp is go through the config file and register all benchmark cases depending on our custom CLI flags (--build/--search and --override_kv series).
Oh, and the registered benchmark names are derived from config[index][i][name].

…benchmarks

achirkin · 2023-08-17T11:43:16Z

Update

ANN_BENCH executable does not require CUDA at compile time anymore
With minimal internal changes python scripts are now running the end-to-end examples from the docs

That is, this PR now shouldn't break anything and should allow compiling hnswlib benchmarks without CUDA.

achirkin · 2023-08-17T14:50:43Z

Update

The default behavior is now more similar to the main branch: build one executable per benchmark. The produced executable do not use dlopen. The single-exec behavior becomes an opt-in feature that is disabled by default.

…benchmarks

…in the next commit

cjnolet

LGTM.

…hance to capture them.

…ed typos)

cjnolet · 2023-08-30T12:32:47Z

/merge

…entations. (#1769) This is just fixing merge conflicts for #1661 to continue making progress on new self-contained Python packaging. Closes #1762 Authors: - Corey J. Nolet (https://github.com/cjnolet) - Artem M. Chirkin (https://github.com/achirkin) - Divye Gala (https://github.com/divyegala) Approvers: - Ray Douglass (https://github.com/raydouglass) - Dante Gama Dessavre (https://github.com/dantegd) - Artem M. Chirkin (https://github.com/achirkin) URL: #1769

achirkin added 3 - Ready for Review improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jul 21, 2023

achirkin requested review from a team as code owners July 21, 2023 12:40

github-actions bot added cpp CMake labels Jul 21, 2023

achirkin requested review from cjnolet and tfeher July 21, 2023 12:41

cjnolet assigned achirkin Jul 21, 2023

achirkin added breaking Breaking change and removed non-breaking Non-breaking change labels Jul 22, 2023

achirkin changed the base branch from branch-23.08 to branch-23.10 July 31, 2023 07:46

achirkin mentioned this pull request Aug 9, 2023

[DOC] Benchmark matrix across RAFT ANN algorithms #1727

Open

achirkin force-pushed the enh-google-benchmarks branch from 9ba9854 to 8cfd2ae Compare August 9, 2023 11:18

ANN-benchmarks: switch to use gbench

bd738ec

achirkin force-pushed the enh-google-benchmarks branch from 8cfd2ae to bd738ec Compare August 9, 2023 11:19

Disable NVTX if the nvtx3 headers are missing

7473c62

Merge branch 'branch-23.10' into enh-google-benchmarks

aa10d7c

Merge branch 'branch-23.10' into enh-google-benchmarks

bed126c

achirkin mentioned this pull request Aug 17, 2023

[FEA] CPU-only package and environments for ANN-benchmarks #1735

Closed

achirkin mentioned this pull request Aug 17, 2023

[FEA] Consolidate remaining CUDA runtime calls in bench/ann #1318

Open

achirkin added 4 commits August 17, 2023 10:01

Allow to compile ANN_BENCH without CUDA

49732b1

Merge remote-tracking branch 'rapidsai/branch-23.10' into enh-google-…

76cfb40

…benchmarks

Fix style

9b588af

Adapt ANN benchmark python scripts

6d6c17d

Make the default behavior to produce one executable per benchmark

b89b27d

achirkin and others added 4 commits August 17, 2023 17:02

Fix style problems / pre-commit

163a40c

Merge branch 'branch-23.10' into enh-google-benchmarks

0bb51a3

Merge remote-tracking branch 'rapidsai/branch-23.10' into enh-google-…

2b9f649

…benchmarks

Merge branch 'branch-23.10' into enh-google-benchmarks

9728f7e

cjnolet mentioned this pull request Aug 24, 2023

Improvements to raft-ann-bench scripts, docs, and benchmarking implementations. #1769

Merged

achirkin and others added 4 commits August 24, 2023 22:20

Merge branch 'branch-23.10' - CONFIGS ONLY - dataset_memtype follows …

4e0a53e

…in the next commit

Add dataset_memory_type/query_memory_type as build/search parameters

04893c9

Fix FAISS using a destroyed stream from previous benchmark case

0eaa7e0

Merge branch 'branch-23.10' into enh-google-benchmarks

bb3a194

cjnolet approved these changes Aug 29, 2023

View reviewed changes

achirkin and others added 3 commits August 29, 2023 07:23

Merge branch 'branch-23.10' into enh-google-benchmarks

e420593

Move the 'dump_parameters' earlier in the benchmarks to have higher c…

913dec2

…hance to capture them.

Merge branch 'branch-23.10' into enh-google-benchmarks

2e71c40

achirkin added a commit to cjnolet/raft that referenced this pull request Aug 30, 2023

Apply suggestions from code review (fix the rapidsai#1661 merge-relat…

d0afbff

…ed typos)

rapids-bot bot merged commit f6d35ae into rapidsai:branch-23.10 Aug 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ANN-benchmarks: switch to use gbench #1661

ANN-benchmarks: switch to use gbench #1661

achirkin commented Jul 21, 2023 •

edited

Loading

achirkin commented Jul 21, 2023 •

edited

Loading

divyegala commented Jul 21, 2023

achirkin commented Jul 21, 2023 •

edited

Loading

divyegala commented Jul 21, 2023

achirkin commented Jul 22, 2023

cjnolet commented Aug 10, 2023

achirkin commented Aug 10, 2023

divyegala commented Aug 11, 2023 •

edited

Loading

achirkin commented Aug 12, 2023 •

edited

Loading

achirkin commented Aug 17, 2023

achirkin commented Aug 17, 2023

cjnolet left a comment

cjnolet commented Aug 30, 2023

ANN-benchmarks: switch to use gbench #1661

ANN-benchmarks: switch to use gbench #1661

Conversation

achirkin commented Jul 21, 2023 • edited Loading

Overview

A primer

Motivation

Eliminate huge bug-prone configs

Run the benchmarks and aggregate the data in the minimal environment

Speedup the benchmarks

Better measurement of QPS

Extra context in the reports

Easier profiling

achirkin commented Jul 21, 2023 • edited Loading

divyegala commented Jul 21, 2023

achirkin commented Jul 21, 2023 • edited Loading

divyegala commented Jul 21, 2023

achirkin commented Jul 22, 2023

cjnolet commented Aug 10, 2023

achirkin commented Aug 10, 2023

divyegala commented Aug 11, 2023 • edited Loading

achirkin commented Aug 12, 2023 • edited Loading

achirkin commented Aug 17, 2023

Update

achirkin commented Aug 17, 2023

Update

cjnolet left a comment

Choose a reason for hiding this comment

cjnolet commented Aug 30, 2023

achirkin commented Jul 21, 2023 •

edited

Loading

achirkin commented Jul 21, 2023 •

edited

Loading

achirkin commented Jul 21, 2023 •

edited

Loading

divyegala commented Aug 11, 2023 •

edited

Loading

achirkin commented Aug 12, 2023 •

edited

Loading