Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix import of VERSION file in raft-ann-bench #2338

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions docs/source/ann_benchmarks_low_level.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ $CONDA_PREFIX/bin/ann/RAFT_IVF_FLAT_ANN_BENCH \
--data_prefix=datasets \
--build \
--benchmark_filter="raft_ivf_flat\..*" \
python/raft-ann-bench/src/raft-ann-bench/run/conf/glove-100-inner.json
python/raft-ann-bench/src/raft_ann_bench/run/conf/glove-100-inner.json

# (3) search
$CONDA_PREFIX/bin/ann/RAFT_IVF_FLAT_ANN_BENCH\
Expand All @@ -29,7 +29,7 @@ $CONDA_PREFIX/bin/ann/RAFT_IVF_FLAT_ANN_BENCH\
--benchmark_counters_tabular \
--search \
--benchmark_filter="raft_ivf_flat\..*" \
python/raft-ann-bench/src/raft-ann-bench/run/conf/glove-100-inner.json
python/raft-ann-bench/src/raft_ann_bench/run/conf/glove-100-inner.json


# optional step: plot QPS-Recall figure using data in ivf_flat_search.csv with your favorite tool
Expand All @@ -43,12 +43,12 @@ A dataset usually has 4 binary files containing database vectors, query vectors,
The file suffixes `.fbin`, `.f16bin`, `.ibin`, `.u8bin`, and `.i8bin` denote that the data type of vectors stored in the file are `float32`, `float16`(a.k.a `half`), `int`, `uint8`, and `int8`, respectively.
These binary files are little-endian and the format is: the first 8 bytes are `num_vectors` (`uint32_t`) and `num_dimensions` (`uint32_t`), and the following `num_vectors * num_dimensions * sizeof(type)` bytes are vectors stored in row-major order.

Some implementation can take `float16` database and query vectors as inputs and will have better performance. Use `python/raft-ann-bench/src/raft-ann-bench/get_dataset/fbin_to_f16bin.py` to transform dataset from `float32` to `float16` type.
Some implementation can take `float16` database and query vectors as inputs and will have better performance. Use `python/raft-ann-bench/src/raft_ann_bench/get_dataset/fbin_to_f16bin.py` to transform dataset from `float32` to `float16` type.

Commonly used datasets can be downloaded from two websites:
1. Million-scale datasets can be found at the [Data sets](https://github.com/erikbern/ann-benchmarks#data-sets) section of [`ann-benchmarks`](https://github.com/erikbern/ann-benchmarks).

However, these datasets are in HDF5 format. Use `python/raft-ann-bench/src/raft-ann-bench/get_dataset/fbin_to_f16bin.py/hdf5_to_fbin.py` to transform the format. A few Python packages are required to run it:
However, these datasets are in HDF5 format. Use `python/raft-ann-bench/src/raft_ann_bench/get_dataset/fbin_to_f16bin.py/hdf5_to_fbin.py` to transform the format. A few Python packages are required to run it:
```bash
pip3 install numpy h5py
```
Expand All @@ -68,7 +68,7 @@ Commonly used datasets can be downloaded from two websites:

2. Billion-scale datasets can be found at [`big-ann-benchmarks`](http://big-ann-benchmarks.com). The ground truth file contains both neighbors and distances, thus should be split. A script is provided for this:
```bash
$ python/raft-ann-bench/src/raft-ann-bench/split_groundtruth/split_groundtruth.pl
$ python/raft-ann-bench/src/raft_ann_bench/split_groundtruth/split_groundtruth.pl
usage: split_groundtruth.pl input output_prefix
```
Take Deep-1B dataset as an example:
Expand All @@ -78,7 +78,7 @@ Commonly used datasets can be downloaded from two websites:
mkdir -p data/deep-1B && cd data/deep-1B
# download manually "Ground Truth" file of "Yandex DEEP"
# suppose the file name is deep_new_groundtruth.public.10K.bin
/path/to/raft/python/raft-ann-bench/src/raft-ann-bench/split_groundtruth/split_groundtruth.pl deep_new_groundtruth.public.10K.bin groundtruth
/path/to/raft/python/raft-ann-bench/src/raft_ann_bench/split_groundtruth/split_groundtruth.pl deep_new_groundtruth.public.10K.bin groundtruth
# two files 'groundtruth.neighbors.ibin' and 'groundtruth.distances.fbin' should be produced
popd
```
Expand Down
20 changes: 10 additions & 10 deletions docs/source/raft_ann_benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ We provide a collection of lightweight Python scripts to run the benchmarks. The
4. Plot Results

### Step 1: Prepare Dataset
The script `raft-ann-bench.get_dataset` will download and unpack the dataset in directory
The script `raft_ann_bench.get_dataset` will download and unpack the dataset in directory
that the user provides. As of now, only million-scale datasets are supported by this
script. For more information on [datasets and formats](ann_benchmarks_dataset.md).

Expand All @@ -117,10 +117,10 @@ will be normalized to inner product. So, for example, the dataset `glove-100-ang
will be written at location `datasets/glove-100-inner/`.

### Step 2: Build and Search Index
The script `raft-ann-bench.run` will build and search indices for a given dataset and its
The script `raft_ann_bench.run` will build and search indices for a given dataset and its
specified configuration.

The usage of the script `raft-ann-bench.run` is:
The usage of the script `raft_ann_bench.run` is:
```bash
usage: __main__.py [-h] [--subset-size SUBSET_SIZE] [-k COUNT] [-bs BATCH_SIZE] [--dataset-configuration DATASET_CONFIGURATION] [--configuration CONFIGURATION] [--dataset DATASET]
[--dataset-path DATASET_PATH] [--build] [--search] [--algorithms ALGORITHMS] [--groups GROUPS] [--algo-groups ALGO_GROUPS] [-f] [-m SEARCH_MODE]
Expand Down Expand Up @@ -186,8 +186,8 @@ it is assumed both are `True`.
is available in `algos.yaml` and not disabled, as well as having an associated executable.

### Step 3: Data Export
The script `raft-ann-bench.data_export` will convert the intermediate JSON outputs produced by `raft-ann-bench.run` to more
easily readable CSV files, which are needed to build charts made by `raft-ann-bench.plot`.
The script `raft_ann_bench.data_export` will convert the intermediate JSON outputs produced by `raft_ann_bench.run` to more
easily readable CSV files, which are needed to build charts made by `raft_ann_bench.plot`.

```bash
usage: data_export.py [-h] [--dataset DATASET] [--dataset-path DATASET_PATH]
Expand All @@ -206,7 +206,7 @@ and index search statistics CSV file in `<dataset-path/<dataset>/result/search/<


### Step 4: Plot Results
The script `raft-ann-bench.plot` will plot results for all algorithms found in index search statistics
The script `raft_ann_bench.plot` will plot results for all algorithms found in index search statistics
CSV files `<dataset-path/<dataset>/result/search/*.csv`.

The usage of this script is:
Expand Down Expand Up @@ -277,7 +277,7 @@ python -m raft_ann_bench.data_export --dataset deep-image-96-inner
python -m raft_ann_bench.plot --dataset deep-image-96-inner
```

Configuration files already exist for the following list of the million-scale datasets. Please refer to [ann-benchmarks datasets](https://github.com/erikbern/ann-benchmarks/#data-sets) for more information, including actual train and sizes. These all work out-of-the-box with the `--dataset` argument. Other million-scale datasets from `ann-benchmarks.com` will work, but will require a json configuration file to be created in `$CONDA_PREFIX/lib/python3.xx/site-packages/raft-ann-bench/run/conf`, or you can specify the `--configuration` option to use a specific file.
Configuration files already exist for the following list of the million-scale datasets. Please refer to [ann-benchmarks datasets](https://github.com/erikbern/ann-benchmarks/#data-sets) for more information, including actual train and sizes. These all work out-of-the-box with the `--dataset` argument. Other million-scale datasets from `ann-benchmarks.com` will work, but will require a json configuration file to be created in `$CONDA_PREFIX/lib/python3.xx/site-packages/raft_ann_bench/run/conf`, or you can specify the `--configuration` option to use a specific file.

| Dataset Name | Train Rows | Columns | Test Rows | Distance |
|-----|------------|----|----------------|------------|
Expand All @@ -293,7 +293,7 @@ All of the datasets above contain ground test datasets with 100 neighbors. Thus

### End to end: large-scale benchmarks (>10M vectors)

`raft-ann-bench.get_dataset` cannot be used to download the [billion-scale datasets](ann_benchmarks_dataset.md#billion-scale)
`raft_ann_bench.get_dataset` cannot be used to download the [billion-scale datasets](ann_benchmarks_dataset.md#billion-scale)
due to their size. You should instead use our billion-scale datasets guide to download and prepare them.
All other python commands mentioned below work as intended once the
billion-scale dataset has been downloaded.
Expand Down Expand Up @@ -441,7 +441,7 @@ Note the following:

A single configuration will often define a set of algorithms, with associated index and search parameters, that can be generalize across datasets. We use YAML to define dataset specific and algorithm specific configurations.

<a id='yaml-dataset-config'></a>A default `datasets.yaml` is provided by RAFT in `${RAFT_HOME}/python/raft-ann-bench/src/raft-ann-bench/run/conf` with configurations available for several datasets. Here's a simple example entry for the `sift-128-euclidean` dataset:
<a id='yaml-dataset-config'></a>A default `datasets.yaml` is provided by RAFT in `${RAFT_HOME}/python/raft-ann-bench/src/raft_ann_bench/run/conf` with configurations available for several datasets. Here's a simple example entry for the `sift-128-euclidean` dataset:

```yaml
- name: sift-128-euclidean
Expand All @@ -452,7 +452,7 @@ A single configuration will often define a set of algorithms, with associated in
distance: euclidean
```

<a id='yaml-algo-config'></a>Configuration files for ANN algorithms supported by `raft-ann-bench` are provided in `${RAFT_HOME}/python/raft-ann-bench/src/raft-ann-bench/run/conf`. `raft_cagra` algorithm configuration looks like:
<a id='yaml-algo-config'></a>Configuration files for ANN algorithms supported by `raft-ann-bench` are provided in `${RAFT_HOME}/python/raft-ann-bench/src/raft_ann_bench/run/conf`. `raft_cagra` algorithm configuration looks like:
```yaml
name: raft_cagra
groups:
Expand Down
2 changes: 1 addition & 1 deletion python/raft-ann-bench/src/raft_ann_bench/_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
import importlib.resources

__version__ = (
importlib.resources.files("raft-ann-bench")
importlib.resources.files("raft_ann_bench")
.joinpath("VERSION")
.read_text()
.strip()
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: hnswlib
constraints:
search: raft-ann-bench.constraints.hnswlib_search_constraints
search: raft_ann_bench.constraints.hnswlib_search_constraints
groups:
base:
build:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: raft_cagra
constraints:
build: raft-ann-bench.constraints.raft_cagra_build_constraints
search: raft-ann-bench.constraints.raft_cagra_search_constraints
build: raft_ann_bench.constraints.raft_cagra_build_constraints
search: raft_ann_bench.constraints.raft_cagra_search_constraints
groups:
base:
build:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: raft_cagra_hnswlib
constraints:
search: raft-ann-bench.constraints.hnswlib_search_constraints
search: raft_ann_bench.constraints.hnswlib_search_constraints
groups:
base:
build:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: raft_ivf_pq
constraints:
build: raft-ann-bench.constraints.raft_ivf_pq_build_constraints
search: raft-ann-bench.constraints.raft_ivf_pq_search_constraints
build: raft_ann_bench.constraints.raft_ivf_pq_build_constraints
search: raft_ann_bench.constraints.raft_ivf_pq_search_constraints
groups:
base:
build:
Expand Down
Loading