-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ANN-benchmarks: switch to use gbench #1661
ANN-benchmarks: switch to use gbench #1661
Conversation
CC @divyegala as I've heard you were planning to do some work on python-side benchmark scripts? |
@achirkin is recall/qps still reported as a CSV file, or just output to the terminal? |
It all goes via gbench; e.g. I often use the following command to print both into terminal tabulated and into a csv file: ./cpp/build/RAFT_IVF_PQ_ANN_BENCH \
--benchmark_out=ivf_pq.csv --benchmark_out_format=csv \
--benchmark_counters_tabular --search --benchmark_filter="raft_ivf_pq\..*" \
cpp/bench/ann/conf/bigann-100M.json To calculate recalls in this branch, you need to add the path to ground truth in the config, along with "base_file" and "query_file": "dataset": {
"name": "sift-128-euclidean",
"base_file": "data/sift-128-euclidean/base.fbin",
"query_file": "data/sift-128-euclidean/query.fbin",
"groundtruth_neighbors_file": "data/sift-128-euclidean/groundtruth.neighbors.ibin"
"distance": "euclidean"
}, Without this, the benchmark gives a warning that the recall is not available. |
@achirkin I would mark this as a breaking change. It changes default behavior, and changes how the executables work.
Could you please also update the markdown file with these new instructions? |
Sure, though I thought the breaking/non-breaking referred to the library components that could be used downstream. |
9ba9854
to
8cfd2ae
Compare
8cfd2ae
to
bd738ec
Compare
Indeed they allow! In general, gbench CLI is rather flexible and, in conjunction with our extra CLI parameters, should cover all our needs both standalone and as a backend for the python scripts. I believe a better way forward is to not rush this PR and take time to adapt the python scripts. I assume most of the work should be just deleting parts of the code: no need to select the right executable (ANN_BENCH does dynamic dispatch), adjust/modify configs at runtime, evaluate the recall, parse the scattered model/search parameters. Just ask the ANN_BENCH to produce a single json and read all relevant information from there. May I ask, @divyegala, as the author of the python scripts, if you have some cycles to help with this? We could do the changes directly in this PR, or better in an immediate follow-up. |
@achirkin sure, I will help with updating the Python scripts. Out of curiosity - I was going through your PR, and I couldn't find how |
Thanks, @divyegala! All CLI arguments of the form |
Update
That is, this PR now shouldn't break anything and should allow compiling hnswlib benchmarks without CUDA. |
Update
|
…in the next commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
/merge |
…entations. (#1769) This is just fixing merge conflicts for #1661 to continue making progress on new self-contained Python packaging. Closes #1762 Authors: - Corey J. Nolet (https://github.com/cjnolet) - Artem M. Chirkin (https://github.com/achirkin) - Divye Gala (https://github.com/divyegala) Approvers: - Ray Douglass (https://github.com/raydouglass) - Dante Gama Dessavre (https://github.com/dantegd) - Artem M. Chirkin (https://github.com/achirkin) URL: #1769
Make the ANN benchmarks use the same google benchmark infrastructure as the prim benchmarks while keeping the functional changes minimal.
Overview
ANN_BENCH
, all of the algorithms are loaded as shared libraries. The CPU-only components do not require cuda at runtime (ANN_BENCH itself, hnswlib).--data_prefix
to specify a custom path where the data sets are stored--index_prefix
to specify a custom path where the index sets are stored--override_kv=<key:value1:value2:...:valueN>
override one or more parameters of search/build for parameter-sweep benchmarksBreaking change: the behavior of the ANN benchmark executables (library API is not touched). The executable CLI flags have changed, so the newer, adapted wrapper scripts won't work with the executables from the libraft-ann-bench-23.08 conda package.
A primer
Motivation
Eliminate huge bug-prone configs
The current config fixes the batch size and k to one value per-config, so the whole config needs to be copied to try multiple values. In the PR, both these parameters can be overwritten in the search parameters and/or via command line (
ANN_BENCH --override_kv=n_queries:1:100:1000 --override_kv=k:1:10:20:50:100:200:500:1000
would test all combinations in one go). Any of the build/search parameters can be overwritten at the same time.Run the benchmarks and aggregate the data in the minimal environment
The new executable generates reports with QPS, Recall, and other metrics using gbench. Hence there's no need to copy back and forth dozens of result files and no need to install python environment for running or evaluating. A single CSV or JSON can be produced for all algorithms and run configurations per dataset+hardware pair.
Speedup the benchmarks
The current benchmark framework is extremely slow due to two factors:
In the proposed solution, a user can set the desired time or number of iterations to run; the data is loaded only once and the index is cached between the search test cases. My subjective conservative estimate is the overall speedup of more than x100 for running a typical large-scale benchmark.
Better measurement of QPS
By default, the current benchmark reports the average execution time and does not warm-up iterations. As a result, the first test case on most of our plots is distorted (e.g. the first iteration of the first case takes about a second or two to run, and that significantly affects the average of the rest 999 ~100us iterations).
gbench
provides the--benchmark_min_warmup_time
parameters to skip first one or few iterations, which solves the problem.Extra context in the reports
The new benchmark executable uses gbench context to augment the report with some essential information: base and query set name, dimensionality, and size, distance metric, some CPU and GPU info, CUDA version. All this is appended directly to the generated CSV/JSON files, which makes the bookkeeping much easier.
In addition, a user may pass extra context via command line
--benchmark_context=<key>=<value>
; this could be e.g. the hostname, some ENV variables, etc.Easier profiling
Thanks to flexible regex filtering and parameter overriding, now it's possible to specify a subset of cases and an exact number of times they should run. This makes the profiling using such tools as
nsys
andncu
much easier.