Skip to content

Commit

Permalink
Fix import of VERSION file in raft-ann-bench (#2338)
Browse files Browse the repository at this point in the history
Change the imported package name to reflect the new name as of #2333.

Authors:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Divye Gala (https://github.com/divyegala)

URL: #2338
  • Loading branch information
KyleFromNVIDIA authored May 24, 2024
1 parent 9c8d111 commit 5c6cd92
Show file tree
Hide file tree
Showing 7 changed files with 23 additions and 23 deletions.
12 changes: 6 additions & 6 deletions docs/source/ann_benchmarks_low_level.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ $CONDA_PREFIX/bin/ann/RAFT_IVF_FLAT_ANN_BENCH \
--data_prefix=datasets \
--build \
--benchmark_filter="raft_ivf_flat\..*" \
python/raft-ann-bench/src/raft-ann-bench/run/conf/glove-100-inner.json
python/raft-ann-bench/src/raft_ann_bench/run/conf/glove-100-inner.json

# (3) search
$CONDA_PREFIX/bin/ann/RAFT_IVF_FLAT_ANN_BENCH\
Expand All @@ -29,7 +29,7 @@ $CONDA_PREFIX/bin/ann/RAFT_IVF_FLAT_ANN_BENCH\
--benchmark_counters_tabular \
--search \
--benchmark_filter="raft_ivf_flat\..*" \
python/raft-ann-bench/src/raft-ann-bench/run/conf/glove-100-inner.json
python/raft-ann-bench/src/raft_ann_bench/run/conf/glove-100-inner.json


# optional step: plot QPS-Recall figure using data in ivf_flat_search.csv with your favorite tool
Expand All @@ -43,12 +43,12 @@ A dataset usually has 4 binary files containing database vectors, query vectors,
The file suffixes `.fbin`, `.f16bin`, `.ibin`, `.u8bin`, and `.i8bin` denote that the data type of vectors stored in the file are `float32`, `float16`(a.k.a `half`), `int`, `uint8`, and `int8`, respectively.
These binary files are little-endian and the format is: the first 8 bytes are `num_vectors` (`uint32_t`) and `num_dimensions` (`uint32_t`), and the following `num_vectors * num_dimensions * sizeof(type)` bytes are vectors stored in row-major order.

Some implementation can take `float16` database and query vectors as inputs and will have better performance. Use `python/raft-ann-bench/src/raft-ann-bench/get_dataset/fbin_to_f16bin.py` to transform dataset from `float32` to `float16` type.
Some implementation can take `float16` database and query vectors as inputs and will have better performance. Use `python/raft-ann-bench/src/raft_ann_bench/get_dataset/fbin_to_f16bin.py` to transform dataset from `float32` to `float16` type.

Commonly used datasets can be downloaded from two websites:
1. Million-scale datasets can be found at the [Data sets](https://github.com/erikbern/ann-benchmarks#data-sets) section of [`ann-benchmarks`](https://github.com/erikbern/ann-benchmarks).

However, these datasets are in HDF5 format. Use `python/raft-ann-bench/src/raft-ann-bench/get_dataset/fbin_to_f16bin.py/hdf5_to_fbin.py` to transform the format. A few Python packages are required to run it:
However, these datasets are in HDF5 format. Use `python/raft-ann-bench/src/raft_ann_bench/get_dataset/fbin_to_f16bin.py/hdf5_to_fbin.py` to transform the format. A few Python packages are required to run it:
```bash
pip3 install numpy h5py
```
Expand All @@ -68,7 +68,7 @@ Commonly used datasets can be downloaded from two websites:
2. Billion-scale datasets can be found at [`big-ann-benchmarks`](http://big-ann-benchmarks.com). The ground truth file contains both neighbors and distances, thus should be split. A script is provided for this:
```bash
$ python/raft-ann-bench/src/raft-ann-bench/split_groundtruth/split_groundtruth.pl
$ python/raft-ann-bench/src/raft_ann_bench/split_groundtruth/split_groundtruth.pl
usage: split_groundtruth.pl input output_prefix
```
Take Deep-1B dataset as an example:
Expand All @@ -78,7 +78,7 @@ Commonly used datasets can be downloaded from two websites:
mkdir -p data/deep-1B && cd data/deep-1B
# download manually "Ground Truth" file of "Yandex DEEP"
# suppose the file name is deep_new_groundtruth.public.10K.bin
/path/to/raft/python/raft-ann-bench/src/raft-ann-bench/split_groundtruth/split_groundtruth.pl deep_new_groundtruth.public.10K.bin groundtruth
/path/to/raft/python/raft-ann-bench/src/raft_ann_bench/split_groundtruth/split_groundtruth.pl deep_new_groundtruth.public.10K.bin groundtruth
# two files 'groundtruth.neighbors.ibin' and 'groundtruth.distances.fbin' should be produced
popd
```
Expand Down
20 changes: 10 additions & 10 deletions docs/source/raft_ann_benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ We provide a collection of lightweight Python scripts to run the benchmarks. The
4. Plot Results

### Step 1: Prepare Dataset
The script `raft-ann-bench.get_dataset` will download and unpack the dataset in directory
The script `raft_ann_bench.get_dataset` will download and unpack the dataset in directory
that the user provides. As of now, only million-scale datasets are supported by this
script. For more information on [datasets and formats](ann_benchmarks_dataset.md).

Expand All @@ -117,10 +117,10 @@ will be normalized to inner product. So, for example, the dataset `glove-100-ang
will be written at location `datasets/glove-100-inner/`.

### Step 2: Build and Search Index
The script `raft-ann-bench.run` will build and search indices for a given dataset and its
The script `raft_ann_bench.run` will build and search indices for a given dataset and its
specified configuration.

The usage of the script `raft-ann-bench.run` is:
The usage of the script `raft_ann_bench.run` is:
```bash
usage: __main__.py [-h] [--subset-size SUBSET_SIZE] [-k COUNT] [-bs BATCH_SIZE] [--dataset-configuration DATASET_CONFIGURATION] [--configuration CONFIGURATION] [--dataset DATASET]
[--dataset-path DATASET_PATH] [--build] [--search] [--algorithms ALGORITHMS] [--groups GROUPS] [--algo-groups ALGO_GROUPS] [-f] [-m SEARCH_MODE]
Expand Down Expand Up @@ -186,8 +186,8 @@ it is assumed both are `True`.
is available in `algos.yaml` and not disabled, as well as having an associated executable.
### Step 3: Data Export
The script `raft-ann-bench.data_export` will convert the intermediate JSON outputs produced by `raft-ann-bench.run` to more
easily readable CSV files, which are needed to build charts made by `raft-ann-bench.plot`.
The script `raft_ann_bench.data_export` will convert the intermediate JSON outputs produced by `raft_ann_bench.run` to more
easily readable CSV files, which are needed to build charts made by `raft_ann_bench.plot`.
```bash
usage: data_export.py [-h] [--dataset DATASET] [--dataset-path DATASET_PATH]
Expand All @@ -206,7 +206,7 @@ and index search statistics CSV file in `<dataset-path/<dataset>/result/search/<
### Step 4: Plot Results
The script `raft-ann-bench.plot` will plot results for all algorithms found in index search statistics
The script `raft_ann_bench.plot` will plot results for all algorithms found in index search statistics
CSV files `<dataset-path/<dataset>/result/search/*.csv`.
The usage of this script is:
Expand Down Expand Up @@ -277,7 +277,7 @@ python -m raft_ann_bench.data_export --dataset deep-image-96-inner
python -m raft_ann_bench.plot --dataset deep-image-96-inner
```
Configuration files already exist for the following list of the million-scale datasets. Please refer to [ann-benchmarks datasets](https://github.com/erikbern/ann-benchmarks/#data-sets) for more information, including actual train and sizes. These all work out-of-the-box with the `--dataset` argument. Other million-scale datasets from `ann-benchmarks.com` will work, but will require a json configuration file to be created in `$CONDA_PREFIX/lib/python3.xx/site-packages/raft-ann-bench/run/conf`, or you can specify the `--configuration` option to use a specific file.
Configuration files already exist for the following list of the million-scale datasets. Please refer to [ann-benchmarks datasets](https://github.com/erikbern/ann-benchmarks/#data-sets) for more information, including actual train and sizes. These all work out-of-the-box with the `--dataset` argument. Other million-scale datasets from `ann-benchmarks.com` will work, but will require a json configuration file to be created in `$CONDA_PREFIX/lib/python3.xx/site-packages/raft_ann_bench/run/conf`, or you can specify the `--configuration` option to use a specific file.
| Dataset Name | Train Rows | Columns | Test Rows | Distance |
|-----|------------|----|----------------|------------|
Expand All @@ -293,7 +293,7 @@ All of the datasets above contain ground test datasets with 100 neighbors. Thus
### End to end: large-scale benchmarks (>10M vectors)
`raft-ann-bench.get_dataset` cannot be used to download the [billion-scale datasets](ann_benchmarks_dataset.md#billion-scale)
`raft_ann_bench.get_dataset` cannot be used to download the [billion-scale datasets](ann_benchmarks_dataset.md#billion-scale)
due to their size. You should instead use our billion-scale datasets guide to download and prepare them.
All other python commands mentioned below work as intended once the
billion-scale dataset has been downloaded.
Expand Down Expand Up @@ -441,7 +441,7 @@ Note the following:
A single configuration will often define a set of algorithms, with associated index and search parameters, that can be generalize across datasets. We use YAML to define dataset specific and algorithm specific configurations.
<a id='yaml-dataset-config'></a>A default `datasets.yaml` is provided by RAFT in `${RAFT_HOME}/python/raft-ann-bench/src/raft-ann-bench/run/conf` with configurations available for several datasets. Here's a simple example entry for the `sift-128-euclidean` dataset:
<a id='yaml-dataset-config'></a>A default `datasets.yaml` is provided by RAFT in `${RAFT_HOME}/python/raft-ann-bench/src/raft_ann_bench/run/conf` with configurations available for several datasets. Here's a simple example entry for the `sift-128-euclidean` dataset:
```yaml
- name: sift-128-euclidean
Expand All @@ -452,7 +452,7 @@ A single configuration will often define a set of algorithms, with associated in
distance: euclidean
```
<a id='yaml-algo-config'></a>Configuration files for ANN algorithms supported by `raft-ann-bench` are provided in `${RAFT_HOME}/python/raft-ann-bench/src/raft-ann-bench/run/conf`. `raft_cagra` algorithm configuration looks like:
<a id='yaml-algo-config'></a>Configuration files for ANN algorithms supported by `raft-ann-bench` are provided in `${RAFT_HOME}/python/raft-ann-bench/src/raft_ann_bench/run/conf`. `raft_cagra` algorithm configuration looks like:
```yaml
name: raft_cagra
groups:
Expand Down
2 changes: 1 addition & 1 deletion python/raft-ann-bench/src/raft_ann_bench/_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
import importlib.resources

__version__ = (
importlib.resources.files("raft-ann-bench")
importlib.resources.files("raft_ann_bench")
.joinpath("VERSION")
.read_text()
.strip()
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: hnswlib
constraints:
search: raft-ann-bench.constraints.hnswlib_search_constraints
search: raft_ann_bench.constraints.hnswlib_search_constraints
groups:
base:
build:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: raft_cagra
constraints:
build: raft-ann-bench.constraints.raft_cagra_build_constraints
search: raft-ann-bench.constraints.raft_cagra_search_constraints
build: raft_ann_bench.constraints.raft_cagra_build_constraints
search: raft_ann_bench.constraints.raft_cagra_search_constraints
groups:
base:
build:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: raft_cagra_hnswlib
constraints:
search: raft-ann-bench.constraints.hnswlib_search_constraints
search: raft_ann_bench.constraints.hnswlib_search_constraints
groups:
base:
build:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: raft_ivf_pq
constraints:
build: raft-ann-bench.constraints.raft_ivf_pq_build_constraints
search: raft-ann-bench.constraints.raft_ivf_pq_search_constraints
build: raft_ann_bench.constraints.raft_ivf_pq_build_constraints
search: raft_ann_bench.constraints.raft_ivf_pq_search_constraints
groups:
base:
build:
Expand Down

0 comments on commit 5c6cd92

Please sign in to comment.