diff --git a/docs/source/raft_ann_benchmarks.md b/docs/source/raft_ann_benchmarks.md index 865e056710..4b3aef5600 100644 --- a/docs/source/raft_ann_benchmarks.md +++ b/docs/source/raft_ann_benchmarks.md @@ -96,7 +96,7 @@ We provide a collection of lightweight Python scripts to run the benchmarks. The 4. Plot Results ### Step 1: Prepare Dataset -The script `raft-ann-bench.get_dataset` will download and unpack the dataset in directory +The script `raft_ann_bench.get_dataset` will download and unpack the dataset in directory that the user provides. As of now, only million-scale datasets are supported by this script. For more information on [datasets and formats](ann_benchmarks_dataset.md). @@ -117,10 +117,10 @@ will be normalized to inner product. So, for example, the dataset `glove-100-ang will be written at location `datasets/glove-100-inner/`. ### Step 2: Build and Search Index -The script `raft-ann-bench.run` will build and search indices for a given dataset and its +The script `raft_ann_bench.run` will build and search indices for a given dataset and its specified configuration. -The usage of the script `raft-ann-bench.run` is: +The usage of the script `raft_ann_bench.run` is: ```bash usage: __main__.py [-h] [--subset-size SUBSET_SIZE] [-k COUNT] [-bs BATCH_SIZE] [--dataset-configuration DATASET_CONFIGURATION] [--configuration CONFIGURATION] [--dataset DATASET] [--dataset-path DATASET_PATH] [--build] [--search] [--algorithms ALGORITHMS] [--groups GROUPS] [--algo-groups ALGO_GROUPS] [-f] [-m SEARCH_MODE] @@ -186,8 +186,8 @@ it is assumed both are `True`. is available in `algos.yaml` and not disabled, as well as having an associated executable. ### Step 3: Data Export -The script `raft-ann-bench.data_export` will convert the intermediate JSON outputs produced by `raft-ann-bench.run` to more -easily readable CSV files, which are needed to build charts made by `raft-ann-bench.plot`. +The script `raft_ann_bench.data_export` will convert the intermediate JSON outputs produced by `raft_ann_bench.run` to more +easily readable CSV files, which are needed to build charts made by `raft_ann_bench.plot`. ```bash usage: data_export.py [-h] [--dataset DATASET] [--dataset-path DATASET_PATH] @@ -206,7 +206,7 @@ and index search statistics CSV file in `/result/search/< ### Step 4: Plot Results -The script `raft-ann-bench.plot` will plot results for all algorithms found in index search statistics +The script `raft_ann_bench.plot` will plot results for all algorithms found in index search statistics CSV files `/result/search/*.csv`. The usage of this script is: @@ -277,7 +277,7 @@ python -m raft_ann_bench.data_export --dataset deep-image-96-inner python -m raft_ann_bench.plot --dataset deep-image-96-inner ``` -Configuration files already exist for the following list of the million-scale datasets. Please refer to [ann-benchmarks datasets](https://github.com/erikbern/ann-benchmarks/#data-sets) for more information, including actual train and sizes. These all work out-of-the-box with the `--dataset` argument. Other million-scale datasets from `ann-benchmarks.com` will work, but will require a json configuration file to be created in `$CONDA_PREFIX/lib/python3.xx/site-packages/raft-ann-bench/run/conf`, or you can specify the `--configuration` option to use a specific file. +Configuration files already exist for the following list of the million-scale datasets. Please refer to [ann-benchmarks datasets](https://github.com/erikbern/ann-benchmarks/#data-sets) for more information, including actual train and sizes. These all work out-of-the-box with the `--dataset` argument. Other million-scale datasets from `ann-benchmarks.com` will work, but will require a json configuration file to be created in `$CONDA_PREFIX/lib/python3.xx/site-packages/raft_ann_bench/run/conf`, or you can specify the `--configuration` option to use a specific file. | Dataset Name | Train Rows | Columns | Test Rows | Distance | |-----|------------|----|----------------|------------| @@ -293,7 +293,7 @@ All of the datasets above contain ground test datasets with 100 neighbors. Thus ### End to end: large-scale benchmarks (>10M vectors) -`raft-ann-bench.get_dataset` cannot be used to download the [billion-scale datasets](ann_benchmarks_dataset.md#billion-scale) +`raft_ann_bench.get_dataset` cannot be used to download the [billion-scale datasets](ann_benchmarks_dataset.md#billion-scale) due to their size. You should instead use our billion-scale datasets guide to download and prepare them. All other python commands mentioned below work as intended once the billion-scale dataset has been downloaded. diff --git a/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/hnswlib.yaml b/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/hnswlib.yaml index 9268c4cb08..e7a4e6b506 100644 --- a/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/hnswlib.yaml +++ b/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/hnswlib.yaml @@ -1,6 +1,6 @@ name: hnswlib constraints: - search: raft-ann-bench.constraints.hnswlib_search_constraints + search: raft_ann_bench.constraints.hnswlib_search_constraints groups: base: build: diff --git a/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/raft_cagra.yaml b/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/raft_cagra.yaml index 374458989a..bb66b4b232 100644 --- a/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/raft_cagra.yaml +++ b/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/raft_cagra.yaml @@ -1,7 +1,7 @@ name: raft_cagra constraints: - build: raft-ann-bench.constraints.raft_cagra_build_constraints - search: raft-ann-bench.constraints.raft_cagra_search_constraints + build: raft_ann_bench.constraints.raft_cagra_build_constraints + search: raft_ann_bench.constraints.raft_cagra_search_constraints groups: base: build: diff --git a/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/raft_cagra_hnswlib.yaml b/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/raft_cagra_hnswlib.yaml index 787675d65d..3ac2d16b68 100644 --- a/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/raft_cagra_hnswlib.yaml +++ b/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/raft_cagra_hnswlib.yaml @@ -1,6 +1,6 @@ name: raft_cagra_hnswlib constraints: - search: raft-ann-bench.constraints.hnswlib_search_constraints + search: raft_ann_bench.constraints.hnswlib_search_constraints groups: base: build: diff --git a/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/raft_ivf_pq.yaml b/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/raft_ivf_pq.yaml index fac383119a..7eaec2b77b 100644 --- a/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/raft_ivf_pq.yaml +++ b/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/raft_ivf_pq.yaml @@ -1,7 +1,7 @@ name: raft_ivf_pq constraints: - build: raft-ann-bench.constraints.raft_ivf_pq_build_constraints - search: raft-ann-bench.constraints.raft_ivf_pq_search_constraints + build: raft_ann_bench.constraints.raft_ivf_pq_build_constraints + search: raft_ann_bench.constraints.raft_ivf_pq_search_constraints groups: base: build: