diff --git a/docs/source/ann_benchmarks_low_level.md b/docs/source/ann_benchmarks_low_level.md index cb583b119b..7ba13dec8d 100644 --- a/docs/source/ann_benchmarks_low_level.md +++ b/docs/source/ann_benchmarks_low_level.md @@ -18,7 +18,7 @@ $CONDA_PREFIX/bin/ann/RAFT_IVF_FLAT_ANN_BENCH \ --data_prefix=datasets \ --build \ --benchmark_filter="raft_ivf_flat\..*" \ - python/raft-ann-bench/src/raft-ann-bench/run/conf/glove-100-inner.json + python/raft-ann-bench/src/raft_ann_bench/run/conf/glove-100-inner.json # (3) search $CONDA_PREFIX/bin/ann/RAFT_IVF_FLAT_ANN_BENCH\ @@ -29,7 +29,7 @@ $CONDA_PREFIX/bin/ann/RAFT_IVF_FLAT_ANN_BENCH\ --benchmark_counters_tabular \ --search \ --benchmark_filter="raft_ivf_flat\..*" \ - python/raft-ann-bench/src/raft-ann-bench/run/conf/glove-100-inner.json + python/raft-ann-bench/src/raft_ann_bench/run/conf/glove-100-inner.json # optional step: plot QPS-Recall figure using data in ivf_flat_search.csv with your favorite tool @@ -43,12 +43,12 @@ A dataset usually has 4 binary files containing database vectors, query vectors, The file suffixes `.fbin`, `.f16bin`, `.ibin`, `.u8bin`, and `.i8bin` denote that the data type of vectors stored in the file are `float32`, `float16`(a.k.a `half`), `int`, `uint8`, and `int8`, respectively. These binary files are little-endian and the format is: the first 8 bytes are `num_vectors` (`uint32_t`) and `num_dimensions` (`uint32_t`), and the following `num_vectors * num_dimensions * sizeof(type)` bytes are vectors stored in row-major order. -Some implementation can take `float16` database and query vectors as inputs and will have better performance. Use `python/raft-ann-bench/src/raft-ann-bench/get_dataset/fbin_to_f16bin.py` to transform dataset from `float32` to `float16` type. +Some implementation can take `float16` database and query vectors as inputs and will have better performance. Use `python/raft-ann-bench/src/raft_ann_bench/get_dataset/fbin_to_f16bin.py` to transform dataset from `float32` to `float16` type. Commonly used datasets can be downloaded from two websites: 1. Million-scale datasets can be found at the [Data sets](https://github.com/erikbern/ann-benchmarks#data-sets) section of [`ann-benchmarks`](https://github.com/erikbern/ann-benchmarks). - However, these datasets are in HDF5 format. Use `python/raft-ann-bench/src/raft-ann-bench/get_dataset/fbin_to_f16bin.py/hdf5_to_fbin.py` to transform the format. A few Python packages are required to run it: + However, these datasets are in HDF5 format. Use `python/raft-ann-bench/src/raft_ann_bench/get_dataset/fbin_to_f16bin.py/hdf5_to_fbin.py` to transform the format. A few Python packages are required to run it: ```bash pip3 install numpy h5py ``` @@ -68,7 +68,7 @@ Commonly used datasets can be downloaded from two websites: 2. Billion-scale datasets can be found at [`big-ann-benchmarks`](http://big-ann-benchmarks.com). The ground truth file contains both neighbors and distances, thus should be split. A script is provided for this: ```bash - $ python/raft-ann-bench/src/raft-ann-bench/split_groundtruth/split_groundtruth.pl + $ python/raft-ann-bench/src/raft_ann_bench/split_groundtruth/split_groundtruth.pl usage: split_groundtruth.pl input output_prefix ``` Take Deep-1B dataset as an example: @@ -78,7 +78,7 @@ Commonly used datasets can be downloaded from two websites: mkdir -p data/deep-1B && cd data/deep-1B # download manually "Ground Truth" file of "Yandex DEEP" # suppose the file name is deep_new_groundtruth.public.10K.bin - /path/to/raft/python/raft-ann-bench/src/raft-ann-bench/split_groundtruth/split_groundtruth.pl deep_new_groundtruth.public.10K.bin groundtruth + /path/to/raft/python/raft-ann-bench/src/raft_ann_bench/split_groundtruth/split_groundtruth.pl deep_new_groundtruth.public.10K.bin groundtruth # two files 'groundtruth.neighbors.ibin' and 'groundtruth.distances.fbin' should be produced popd ``` diff --git a/docs/source/raft_ann_benchmarks.md b/docs/source/raft_ann_benchmarks.md index 146cc104d1..4b3aef5600 100644 --- a/docs/source/raft_ann_benchmarks.md +++ b/docs/source/raft_ann_benchmarks.md @@ -96,7 +96,7 @@ We provide a collection of lightweight Python scripts to run the benchmarks. The 4. Plot Results ### Step 1: Prepare Dataset -The script `raft-ann-bench.get_dataset` will download and unpack the dataset in directory +The script `raft_ann_bench.get_dataset` will download and unpack the dataset in directory that the user provides. As of now, only million-scale datasets are supported by this script. For more information on [datasets and formats](ann_benchmarks_dataset.md). @@ -117,10 +117,10 @@ will be normalized to inner product. So, for example, the dataset `glove-100-ang will be written at location `datasets/glove-100-inner/`. ### Step 2: Build and Search Index -The script `raft-ann-bench.run` will build and search indices for a given dataset and its +The script `raft_ann_bench.run` will build and search indices for a given dataset and its specified configuration. -The usage of the script `raft-ann-bench.run` is: +The usage of the script `raft_ann_bench.run` is: ```bash usage: __main__.py [-h] [--subset-size SUBSET_SIZE] [-k COUNT] [-bs BATCH_SIZE] [--dataset-configuration DATASET_CONFIGURATION] [--configuration CONFIGURATION] [--dataset DATASET] [--dataset-path DATASET_PATH] [--build] [--search] [--algorithms ALGORITHMS] [--groups GROUPS] [--algo-groups ALGO_GROUPS] [-f] [-m SEARCH_MODE] @@ -186,8 +186,8 @@ it is assumed both are `True`. is available in `algos.yaml` and not disabled, as well as having an associated executable. ### Step 3: Data Export -The script `raft-ann-bench.data_export` will convert the intermediate JSON outputs produced by `raft-ann-bench.run` to more -easily readable CSV files, which are needed to build charts made by `raft-ann-bench.plot`. +The script `raft_ann_bench.data_export` will convert the intermediate JSON outputs produced by `raft_ann_bench.run` to more +easily readable CSV files, which are needed to build charts made by `raft_ann_bench.plot`. ```bash usage: data_export.py [-h] [--dataset DATASET] [--dataset-path DATASET_PATH] @@ -206,7 +206,7 @@ and index search statistics CSV file in `/result/search/< ### Step 4: Plot Results -The script `raft-ann-bench.plot` will plot results for all algorithms found in index search statistics +The script `raft_ann_bench.plot` will plot results for all algorithms found in index search statistics CSV files `/result/search/*.csv`. The usage of this script is: @@ -277,7 +277,7 @@ python -m raft_ann_bench.data_export --dataset deep-image-96-inner python -m raft_ann_bench.plot --dataset deep-image-96-inner ``` -Configuration files already exist for the following list of the million-scale datasets. Please refer to [ann-benchmarks datasets](https://github.com/erikbern/ann-benchmarks/#data-sets) for more information, including actual train and sizes. These all work out-of-the-box with the `--dataset` argument. Other million-scale datasets from `ann-benchmarks.com` will work, but will require a json configuration file to be created in `$CONDA_PREFIX/lib/python3.xx/site-packages/raft-ann-bench/run/conf`, or you can specify the `--configuration` option to use a specific file. +Configuration files already exist for the following list of the million-scale datasets. Please refer to [ann-benchmarks datasets](https://github.com/erikbern/ann-benchmarks/#data-sets) for more information, including actual train and sizes. These all work out-of-the-box with the `--dataset` argument. Other million-scale datasets from `ann-benchmarks.com` will work, but will require a json configuration file to be created in `$CONDA_PREFIX/lib/python3.xx/site-packages/raft_ann_bench/run/conf`, or you can specify the `--configuration` option to use a specific file. | Dataset Name | Train Rows | Columns | Test Rows | Distance | |-----|------------|----|----------------|------------| @@ -293,7 +293,7 @@ All of the datasets above contain ground test datasets with 100 neighbors. Thus ### End to end: large-scale benchmarks (>10M vectors) -`raft-ann-bench.get_dataset` cannot be used to download the [billion-scale datasets](ann_benchmarks_dataset.md#billion-scale) +`raft_ann_bench.get_dataset` cannot be used to download the [billion-scale datasets](ann_benchmarks_dataset.md#billion-scale) due to their size. You should instead use our billion-scale datasets guide to download and prepare them. All other python commands mentioned below work as intended once the billion-scale dataset has been downloaded. @@ -441,7 +441,7 @@ Note the following: A single configuration will often define a set of algorithms, with associated index and search parameters, that can be generalize across datasets. We use YAML to define dataset specific and algorithm specific configurations. -A default `datasets.yaml` is provided by RAFT in `${RAFT_HOME}/python/raft-ann-bench/src/raft-ann-bench/run/conf` with configurations available for several datasets. Here's a simple example entry for the `sift-128-euclidean` dataset: +A default `datasets.yaml` is provided by RAFT in `${RAFT_HOME}/python/raft-ann-bench/src/raft_ann_bench/run/conf` with configurations available for several datasets. Here's a simple example entry for the `sift-128-euclidean` dataset: ```yaml - name: sift-128-euclidean @@ -452,7 +452,7 @@ A single configuration will often define a set of algorithms, with associated in distance: euclidean ``` -Configuration files for ANN algorithms supported by `raft-ann-bench` are provided in `${RAFT_HOME}/python/raft-ann-bench/src/raft-ann-bench/run/conf`. `raft_cagra` algorithm configuration looks like: +Configuration files for ANN algorithms supported by `raft-ann-bench` are provided in `${RAFT_HOME}/python/raft-ann-bench/src/raft_ann_bench/run/conf`. `raft_cagra` algorithm configuration looks like: ```yaml name: raft_cagra groups: diff --git a/python/raft-ann-bench/src/raft_ann_bench/_version.py b/python/raft-ann-bench/src/raft_ann_bench/_version.py index 6dbb8e81b0..394acd755d 100644 --- a/python/raft-ann-bench/src/raft_ann_bench/_version.py +++ b/python/raft-ann-bench/src/raft_ann_bench/_version.py @@ -17,7 +17,7 @@ import importlib.resources __version__ = ( - importlib.resources.files("raft-ann-bench") + importlib.resources.files("raft_ann_bench") .joinpath("VERSION") .read_text() .strip() diff --git a/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/hnswlib.yaml b/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/hnswlib.yaml index 9268c4cb08..e7a4e6b506 100644 --- a/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/hnswlib.yaml +++ b/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/hnswlib.yaml @@ -1,6 +1,6 @@ name: hnswlib constraints: - search: raft-ann-bench.constraints.hnswlib_search_constraints + search: raft_ann_bench.constraints.hnswlib_search_constraints groups: base: build: diff --git a/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/raft_cagra.yaml b/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/raft_cagra.yaml index 374458989a..bb66b4b232 100644 --- a/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/raft_cagra.yaml +++ b/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/raft_cagra.yaml @@ -1,7 +1,7 @@ name: raft_cagra constraints: - build: raft-ann-bench.constraints.raft_cagra_build_constraints - search: raft-ann-bench.constraints.raft_cagra_search_constraints + build: raft_ann_bench.constraints.raft_cagra_build_constraints + search: raft_ann_bench.constraints.raft_cagra_search_constraints groups: base: build: diff --git a/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/raft_cagra_hnswlib.yaml b/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/raft_cagra_hnswlib.yaml index 787675d65d..3ac2d16b68 100644 --- a/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/raft_cagra_hnswlib.yaml +++ b/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/raft_cagra_hnswlib.yaml @@ -1,6 +1,6 @@ name: raft_cagra_hnswlib constraints: - search: raft-ann-bench.constraints.hnswlib_search_constraints + search: raft_ann_bench.constraints.hnswlib_search_constraints groups: base: build: diff --git a/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/raft_ivf_pq.yaml b/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/raft_ivf_pq.yaml index fac383119a..7eaec2b77b 100644 --- a/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/raft_ivf_pq.yaml +++ b/python/raft-ann-bench/src/raft_ann_bench/run/conf/algos/raft_ivf_pq.yaml @@ -1,7 +1,7 @@ name: raft_ivf_pq constraints: - build: raft-ann-bench.constraints.raft_ivf_pq_build_constraints - search: raft-ann-bench.constraints.raft_ivf_pq_search_constraints + build: raft_ann_bench.constraints.raft_ivf_pq_build_constraints + search: raft_ann_bench.constraints.raft_ivf_pq_search_constraints groups: base: build: