forked from rapidsai/raft
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dev enh google benchmarks #1
Merged
cjnolet
merged 6 commits into
dantegd:dev-enh-google-benchmarks
from
cjnolet:dev-enh-google-benchmarks
Aug 30, 2023
Merged
Dev enh google benchmarks #1
cjnolet
merged 6 commits into
dantegd:dev-enh-google-benchmarks
from
cjnolet:dev-enh-google-benchmarks
Aug 30, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ation (rapidsai#1785) Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Divye Gala (https://github.com/divyegala) URL: rapidsai#1785
Make the ANN benchmarks use the same google benchmark infrastructure as the prim benchmarks while keeping the functional changes minimal. ### Overview - The command-line API largely stays the same, but enhanced with gbench-specific parameters, such as using regex to select algo configs, control the minimum run-time, and flexible reporting to console/files. - There's just one executable `ANN_BENCH`, all of the algorithms are loaded as shared libraries. The CPU-only components do not require cuda at runtime (ANN_BENCH itself, hnswlib). - Some dependencies are linked statically, it's possible to just copy the executable and the libs and run the benchmark on a linux machine with very few packages installed. - Search benchmarks do not produce any output anymore, they use ground truth files to compute and report the recall in-place. - Search/build parameters visible in the config files are passed as benchmark counters/labels/context. - Extra functionality: - `--data_prefix` to specify a custom path where the data sets are stored - `--index_prefix` to specify a custom path where the index sets are stored - `--override_kv=<key:value1:value2:...:valueN>` override one or more parameters of search/build for parameter-sweep benchmarks __Breaking change__: the behavior of the ANN benchmark executables (library API is not touched). The executable CLI flags have changed, so the newer, adapted wrapper scripts won't work with the executables from the libraft-ann-bench-23.08 conda package. ### A primer ```bash ./cpp/build/ANN_BENCH \ # benchmark executable --data_prefix=/datastore/my/local/data/path \ # override (prefix) path to local data --benchmark_min_warmup_time=0.001 \ # spend some minimal time warming up --benchmark_min_time=3s \ # run minimum 3 seconds on each case --benchmark_out=ivf_pq.csv \ # duplicate output to this file --benchmark_out_format=csv \ # the file output should be in CSV format --benchmark_counters_tabular \ # the console output should be tabular --benchmark_filter="raft_ivf_pq\..*" \ # use regex to filter benchmarks --search \ # 'search' mode --override_kv=k:1:10:100:200:500 \ # Parameter-sweep over the top-k value --override_kv=n_queries:1:10:10000 \ # and the search batch size --override_kv=smemLutDtype:"fp8" \ # Override a search parameter cpp/bench/ann/conf/bigann-100M.json # specify the path to the config file ``` ### Motivation #### Eliminate huge bug-prone configs The current config fixes the batch size and k to one value per-config, so the whole config needs to be copied to try multiple values. In the PR, both these parameters can be overwritten in the search parameters and/or via command line (`ANN_BENCH --override_kv=n_queries:1:100:1000 --override_kv=k:1:10:20:50:100:200:500:1000` would test all combinations in one go). Any of the build/search parameters can be overwritten at the same time. #### Run the benchmarks and aggregate the data in the minimal environment The new executable generates reports with QPS, Recall, and other metrics using gbench. Hence there's no need to copy back and forth dozens of result files and no need to install python environment for running or evaluating. A single CSV or JSON can be produced for all algorithms and run configurations per dataset+hardware pair. #### Speedup the benchmarks The current benchmark framework is extremely slow due to two factors: - The dataset and the index need to be loaded for every test case, this takes orders of magnitude longer than the search test itself for large datasets. In my tests, the preparation phase for bigann-1B took ten minutes and the search could take anywhere between a few seconds and a minute. - The benchmark always goes through the whole query dataset. That is, if the query set is 10K and the batch size is 1, the benchmark repeats 10K times (to produce the result file for evaluating the recall). In the proposed solution, a user can set the desired time or number of iterations to run; the data is loaded only once and the index is cached between the search test cases. My subjective conservative estimate is the overall speedup of more than x100 for running a typical large-scale benchmark. #### Better measurement of QPS By default, the current benchmark reports the average execution time and does not warm-up iterations. As a result, the first test case on most of our plots is distorted (e.g. the first iteration of the first case takes about a second or two to run, and that significantly affects the average of the rest 999 ~100us iterations). `gbench` provides the `--benchmark_min_warmup_time` parameters to skip first one or few iterations, which solves the problem. #### Extra context in the reports The new benchmark executable uses gbench context to augment the report with some essential information: base and query set name, dimensionality, and size, distance metric, some CPU and GPU info, CUDA version. All this is appended directly to the generated CSV/JSON files, which makes the bookkeeping much easier. In addition, a user may pass extra context via command line `--benchmark_context=<key>=<value>`; this could be e.g. the hostname, some ENV variables, etc. #### Easier profiling Thanks to flexible regex filtering and parameter overriding, now it's possible to specify a subset of cases and an exact number of times they should run. This makes the profiling using such tools as `nsys` and `ncu` much easier. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1661
An error occurs when using CAGRA multi-CTA implementation with topk>32. This PR fixes the bug. Authors: - tsuki (https://github.com/enp1s0) Approvers: - Artem M. Chirkin (https://github.com/achirkin) - Divye Gala (https://github.com/divyegala) - Micka (https://github.com/lowener) URL: rapidsai#1784
This PR adds the citation information for the CAGRA paper preprint to README.md. Authors: - tsuki (https://github.com/enp1s0) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1787
…entations. (rapidsai#1769) This is just fixing merge conflicts for rapidsai#1661 to continue making progress on new self-contained Python packaging. Closes rapidsai#1762 Authors: - Corey J. Nolet (https://github.com/cjnolet) - Artem M. Chirkin (https://github.com/achirkin) - Divye Gala (https://github.com/divyegala) Approvers: - Ray Douglass (https://github.com/raydouglass) - Dante Gama Dessavre (https://github.com/dantegd) - Artem M. Chirkin (https://github.com/achirkin) URL: rapidsai#1769
dantegd
pushed a commit
that referenced
this pull request
Jul 23, 2024
RAPIDS repos are using the `main` branch of https://github.com/actions/labeler which recently introduced [breaking changes](https://github.com/actions/labeler/releases/tag/v5.0.0). This PR pins to the latest v4 release of the labeler action until we can evaluate the changes required for v5. Authors: - Ray Douglass (https://github.com/raydouglass) Approvers: - AJ Schmidt (https://github.com/ajschmidt8)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.