Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANN-benchmarks: switch to use gbench #1661

Merged
merged 20 commits into from
Aug 30, 2023

Conversation

achirkin
Copy link
Contributor

@achirkin achirkin commented Jul 21, 2023

Make the ANN benchmarks use the same google benchmark infrastructure as the prim benchmarks while keeping the functional changes minimal.

Overview

  • The command-line API largely stays the same, but enhanced with gbench-specific parameters, such as using regex to select algo configs, control the minimum run-time, and flexible reporting to console/files.
  • There's just one executable ANN_BENCH, all of the algorithms are loaded as shared libraries. The CPU-only components do not require cuda at runtime (ANN_BENCH itself, hnswlib).
  • Some dependencies are linked statically, it's possible to just copy the executable and the libs and run the benchmark on a linux machine with very few packages installed.
  • Search benchmarks do not produce any output anymore, they use ground truth files to compute and report the recall in-place.
  • Search/build parameters visible in the config files are passed as benchmark counters/labels/context.
  • Extra functionality:
    • --data_prefix to specify a custom path where the data sets are stored
    • --index_prefix to specify a custom path where the index sets are stored
    • --override_kv=<key:value1:value2:...:valueN> override one or more parameters of search/build for parameter-sweep benchmarks

Breaking change: the behavior of the ANN benchmark executables (library API is not touched). The executable CLI flags have changed, so the newer, adapted wrapper scripts won't work with the executables from the libraft-ann-bench-23.08 conda package.

A primer

./cpp/build/ANN_BENCH                         \ # benchmark executable
  --data_prefix=/datastore/my/local/data/path \ # override (prefix) path to local data
  --benchmark_min_warmup_time=0.001           \ # spend some minimal time warming up
  --benchmark_min_time=3s                     \ # run minimum 3 seconds on each case
  --benchmark_out=ivf_pq.csv                  \ # duplicate output to this file
  --benchmark_out_format=csv                  \ # the file output should be in CSV format
  --benchmark_counters_tabular                \ # the console output should be tabular
  --benchmark_filter="raft_ivf_pq\..*"        \ # use regex to filter benchmarks
  --search                                    \ # 'search' mode
  --override_kv=k:1:10:100:200:500            \ # Parameter-sweep over the top-k value
  --override_kv=n_queries:1:10:10000          \ #                  and the search batch size
  --override_kv=smemLutDtype:"fp8"            \ # Override a search parameter
  cpp/bench/ann/conf/bigann-100M.json           # specify the path to the config file

Motivation

Eliminate huge bug-prone configs

The current config fixes the batch size and k to one value per-config, so the whole config needs to be copied to try multiple values. In the PR, both these parameters can be overwritten in the search parameters and/or via command line (ANN_BENCH --override_kv=n_queries:1:100:1000 --override_kv=k:1:10:20:50:100:200:500:1000 would test all combinations in one go). Any of the build/search parameters can be overwritten at the same time.

Run the benchmarks and aggregate the data in the minimal environment

The new executable generates reports with QPS, Recall, and other metrics using gbench. Hence there's no need to copy back and forth dozens of result files and no need to install python environment for running or evaluating. A single CSV or JSON can be produced for all algorithms and run configurations per dataset+hardware pair.

Speedup the benchmarks

The current benchmark framework is extremely slow due to two factors:

  • The dataset and the index need to be loaded for every test case, this takes orders of magnitude longer than the search test itself for large datasets. In my tests, the preparation phase for bigann-1B took ten minutes and the search could take anywhere between a few seconds and a minute.
  • The benchmark always goes through the whole query dataset. That is, if the query set is 10K and the batch size is 1, the benchmark repeats 10K times (to produce the result file for evaluating the recall).

In the proposed solution, a user can set the desired time or number of iterations to run; the data is loaded only once and the index is cached between the search test cases. My subjective conservative estimate is the overall speedup of more than x100 for running a typical large-scale benchmark.

Better measurement of QPS

By default, the current benchmark reports the average execution time and does not warm-up iterations. As a result, the first test case on most of our plots is distorted (e.g. the first iteration of the first case takes about a second or two to run, and that significantly affects the average of the rest 999 ~100us iterations). gbench provides the --benchmark_min_warmup_time parameters to skip first one or few iterations, which solves the problem.

Extra context in the reports

The new benchmark executable uses gbench context to augment the report with some essential information: base and query set name, dimensionality, and size, distance metric, some CPU and GPU info, CUDA version. All this is appended directly to the generated CSV/JSON files, which makes the bookkeeping much easier.
In addition, a user may pass extra context via command line --benchmark_context=<key>=<value>; this could be e.g. the hostname, some ENV variables, etc.

Easier profiling

Thanks to flexible regex filtering and parameter overriding, now it's possible to specify a subset of cases and an exact number of times they should run. This makes the profiling using such tools as nsys and ncu much easier.

@achirkin achirkin added 3 - Ready for Review improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jul 21, 2023
@achirkin achirkin requested review from a team as code owners July 21, 2023 12:40
@achirkin achirkin requested review from cjnolet and tfeher July 21, 2023 12:41
@achirkin
Copy link
Contributor Author

achirkin commented Jul 21, 2023

CC @divyegala as I've heard you were planning to do some work on python-side benchmark scripts?

@divyegala
Copy link
Member

@achirkin is recall/qps still reported as a CSV file, or just output to the terminal?

@achirkin
Copy link
Contributor Author

achirkin commented Jul 21, 2023

It all goes via gbench; e.g. I often use the following command to print both into terminal tabulated and into a csv file:

./cpp/build/RAFT_IVF_PQ_ANN_BENCH \
  --benchmark_out=ivf_pq.csv --benchmark_out_format=csv \
  --benchmark_counters_tabular --search --benchmark_filter="raft_ivf_pq\..*" \
  cpp/bench/ann/conf/bigann-100M.json

To calculate recalls in this branch, you need to add the path to ground truth in the config, along with "base_file" and "query_file":

  "dataset": {
    "name": "sift-128-euclidean",
    "base_file": "data/sift-128-euclidean/base.fbin",
    "query_file": "data/sift-128-euclidean/query.fbin",
    "groundtruth_neighbors_file": "data/sift-128-euclidean/groundtruth.neighbors.ibin"
    "distance": "euclidean"
  },

Without this, the benchmark gives a warning that the recall is not available.

@divyegala
Copy link
Member

@achirkin I would mark this as a breaking change. It changes default behavior, and changes how the executables work.

  1. Search not producing output files - previously, Search produced output files and we would use scripts/eval.pl to produce a CSV output
  2. The groundtruth file was an input to scripts/eval.pl - now it's part of the configuration file

Could you please also update the markdown file with these new instructions?

@achirkin achirkin added breaking Breaking change and removed non-breaking Non-breaking change labels Jul 22, 2023
@achirkin
Copy link
Contributor Author

Sure, though I thought the breaking/non-breaking referred to the library components that could be used downstream.

@achirkin achirkin changed the base branch from branch-23.08 to branch-23.10 July 31, 2023 07:46
@achirkin achirkin force-pushed the enh-google-benchmarks branch from 9ba9854 to 8cfd2ae Compare August 9, 2023 11:18
@achirkin achirkin force-pushed the enh-google-benchmarks branch from 8cfd2ae to bd738ec Compare August 9, 2023 11:19
@cjnolet
Copy link
Member

cjnolet commented Aug 10, 2023

@achirkin good news is that it looks like the gbench executables allow you to specify the output file and output format as json or csv (see here) so I think you should be able to get the Python scripts loading those files pretty easily.

@achirkin
Copy link
Contributor Author

Indeed they allow! In general, gbench CLI is rather flexible and, in conjunction with our extra CLI parameters, should cover all our needs both standalone and as a backend for the python scripts.
However, answering the question you raised in #1727, the new executable does not produce the binary search results data and associated .txt files with the search parameters and other information. Although I can add a CLI parameter to dump the binary data and replicate the .txt files, this would still require some adjustments on the python side.

I believe a better way forward is to not rush this PR and take time to adapt the python scripts. I assume most of the work should be just deleting parts of the code: no need to select the right executable (ANN_BENCH does dynamic dispatch), adjust/modify configs at runtime, evaluate the recall, parse the scattered model/search parameters. Just ask the ANN_BENCH to produce a single json and read all relevant information from there. May I ask, @divyegala, as the author of the python scripts, if you have some cycles to help with this? We could do the changes directly in this PR, or better in an immediate follow-up.

@divyegala
Copy link
Member

divyegala commented Aug 11, 2023

@achirkin sure, I will help with updating the Python scripts.

Out of curiosity - I was going through your PR, and I couldn't find how --benchmark_filter works. Do you use it as a custom filter on index["algo"]? Can you point in your code where it happens?

@achirkin
Copy link
Contributor Author

achirkin commented Aug 12, 2023

Thanks, @divyegala! All CLI arguments of the form --benchmark_xxx (including the --benchmark_filter) are handled by gbench directly. Under the hood, it uses regex to filter among the registered benchmarks. What I do in benchmark.hpp is go through the config file and register all benchmark cases depending on our custom CLI flags (--build/--search and --override_kv series).
Oh, and the registered benchmark names are derived from config[index][i][name].

@achirkin
Copy link
Contributor Author

Update

  • ANN_BENCH executable does not require CUDA at compile time anymore
  • With minimal internal changes python scripts are now running the end-to-end examples from the docs

That is, this PR now shouldn't break anything and should allow compiling hnswlib benchmarks without CUDA.

@achirkin
Copy link
Contributor Author

Update

  • The default behavior is now more similar to the main branch: build one executable per benchmark. The produced executable do not use dlopen. The single-exec behavior becomes an opt-in feature that is disabled by default.

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

achirkin added a commit to cjnolet/raft that referenced this pull request Aug 30, 2023
@cjnolet
Copy link
Member

cjnolet commented Aug 30, 2023

/merge

@rapids-bot rapids-bot bot merged commit f6d35ae into rapidsai:branch-23.10 Aug 30, 2023
rapids-bot bot pushed a commit that referenced this pull request Aug 30, 2023
…entations. (#1769)

This is just fixing merge conflicts for #1661 to continue making progress on new self-contained Python packaging. 

Closes #1762

Authors:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Artem M. Chirkin (https://github.com/achirkin)
  - Divye Gala (https://github.com/divyegala)

Approvers:
  - Ray Douglass (https://github.com/raydouglass)
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Artem M. Chirkin (https://github.com/achirkin)

URL: #1769
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review breaking Breaking change CMake cpp improvement Improvement / enhancement to an existing function
Projects
Development

Successfully merging this pull request may close these issues.

3 participants