rapidsai · rapids-bot · May 24, 2024 · May 23, 2024 · May 23, 2024 · May 23, 2024
diff --git a/docs/source/ann_benchmarks_low_level.md b/docs/source/ann_benchmarks_low_level.md
@@ -18,7 +18,7 @@ $CONDA_PREFIX/bin/ann/RAFT_IVF_FLAT_ANN_BENCH \
   --data_prefix=datasets \
   --build \
   --benchmark_filter="raft_ivf_flat\..*" \
-  python/raft-ann-bench/src/raft-ann-bench/run/conf/glove-100-inner.json 
+  python/raft-ann-bench/src/raft_ann_bench/run/conf/glove-100-inner.json 
 
 # (3) search
 $CONDA_PREFIX/bin/ann/RAFT_IVF_FLAT_ANN_BENCH\
@@ -29,7 +29,7 @@ $CONDA_PREFIX/bin/ann/RAFT_IVF_FLAT_ANN_BENCH\
   --benchmark_counters_tabular \
   --search \
   --benchmark_filter="raft_ivf_flat\..*" \
-    python/raft-ann-bench/src/raft-ann-bench/run/conf/glove-100-inner.json 
+    python/raft-ann-bench/src/raft_ann_bench/run/conf/glove-100-inner.json 
 
 
 # optional step: plot QPS-Recall figure using data in ivf_flat_search.csv with your favorite tool
@@ -43,12 +43,12 @@ A dataset usually has 4 binary files containing database vectors, query vectors,
 The file suffixes `.fbin`, `.f16bin`, `.ibin`, `.u8bin`, and `.i8bin` denote that the data type of vectors stored in the file are `float32`, `float16`(a.k.a `half`), `int`, `uint8`, and `int8`, respectively.
 These binary files are little-endian and the format is: the first 8 bytes are `num_vectors` (`uint32_t`) and `num_dimensions` (`uint32_t`), and the following `num_vectors * num_dimensions * sizeof(type)` bytes are vectors stored in row-major order.
 
-Some implementation can take `float16` database and query vectors as inputs and will have better performance. Use `python/raft-ann-bench/src/raft-ann-bench/get_dataset/fbin_to_f16bin.py` to transform dataset from `float32` to `float16` type.
+Some implementation can take `float16` database and query vectors as inputs and will have better performance. Use `python/raft-ann-bench/src/raft_ann_bench/get_dataset/fbin_to_f16bin.py` to transform dataset from `float32` to `float16` type.
 
 Commonly used datasets can be downloaded from two websites:
 1. Million-scale datasets can be found at the [Data sets](https://github.com/erikbern/ann-benchmarks#data-sets) section of [`ann-benchmarks`](https://github.com/erikbern/ann-benchmarks).
 
-    However, these datasets are in HDF5 format. Use `python/raft-ann-bench/src/raft-ann-bench/get_dataset/fbin_to_f16bin.py/hdf5_to_fbin.py` to transform the format. A few Python packages are required to run it:
+    However, these datasets are in HDF5 format. Use `python/raft-ann-bench/src/raft_ann_bench/get_dataset/fbin_to_f16bin.py/hdf5_to_fbin.py` to transform the format. A few Python packages are required to run it:
     ```bash
     pip3 install numpy h5py
     ```
@@ -68,7 +68,7 @@ Commonly used datasets can be downloaded from two websites:
 
 2. Billion-scale datasets can be found at [`big-ann-benchmarks`](http://big-ann-benchmarks.com). The ground truth file contains both neighbors and distances, thus should be split. A script is provided for this:
     ```bash
-    $ python/raft-ann-bench/src/raft-ann-bench/split_groundtruth/split_groundtruth.pl
+    $ python/raft-ann-bench/src/raft_ann_bench/split_groundtruth/split_groundtruth.pl
     usage: split_groundtruth.pl input output_prefix
     ```
     Take Deep-1B dataset as an example:
@@ -78,7 +78,7 @@ Commonly used datasets can be downloaded from two websites:
     mkdir -p data/deep-1B && cd data/deep-1B
     # download manually "Ground Truth" file of "Yandex DEEP"
     # suppose the file name is deep_new_groundtruth.public.10K.bin
-    /path/to/raft/python/raft-ann-bench/src/raft-ann-bench/split_groundtruth/split_groundtruth.pl deep_new_groundtruth.public.10K.bin groundtruth
+    /path/to/raft/python/raft-ann-bench/src/raft_ann_bench/split_groundtruth/split_groundtruth.pl deep_new_groundtruth.public.10K.bin groundtruth
     # two files 'groundtruth.neighbors.ibin' and 'groundtruth.distances.fbin' should be produced
     popd
     ```

diff --git a/docs/source/raft_ann_benchmarks.md b/docs/source/raft_ann_benchmarks.md
@@ -96,7 +96,7 @@ We provide a collection of lightweight Python scripts to run the benchmarks. The
 4. Plot Results
 
 ### Step 1: Prepare Dataset
-The script `raft-ann-bench.get_dataset` will download and unpack the dataset in directory
+The script `raft_ann_bench.get_dataset` will download and unpack the dataset in directory
 that the user provides. As of now, only million-scale datasets are supported by this
 script. For more information on [datasets and formats](ann_benchmarks_dataset.md).
 
@@ -117,10 +117,10 @@ will be normalized to inner product. So, for example, the dataset `glove-100-ang
 will be written at location `datasets/glove-100-inner/`.
 
 ### Step 2: Build and Search Index
-The script `raft-ann-bench.run` will build and search indices for a given dataset and its
+The script `raft_ann_bench.run` will build and search indices for a given dataset and its
 specified configuration.
 
-The usage of the script `raft-ann-bench.run` is:
+The usage of the script `raft_ann_bench.run` is:
 ```bash
 usage: __main__.py [-h] [--subset-size SUBSET_SIZE] [-k COUNT] [-bs BATCH_SIZE] [--dataset-configuration DATASET_CONFIGURATION] [--configuration CONFIGURATION] [--dataset DATASET]
                    [--dataset-path DATASET_PATH] [--build] [--search] [--algorithms ALGORITHMS] [--groups GROUPS] [--algo-groups ALGO_GROUPS] [-f] [-m SEARCH_MODE]
@@ -186,8 +186,8 @@ it is assumed both are `True`.
 is available in `algos.yaml` and not disabled, as well as having an associated executable.
 
 ### Step 3: Data Export
-The script `raft-ann-bench.data_export` will convert the intermediate JSON outputs produced by `raft-ann-bench.run` to more
-easily readable CSV files, which are needed to build charts made by `raft-ann-bench.plot`.
+The script `raft_ann_bench.data_export` will convert the intermediate JSON outputs produced by `raft_ann_bench.run` to more
+easily readable CSV files, which are needed to build charts made by `raft_ann_bench.plot`.
 
 ```bash
 usage: data_export.py [-h] [--dataset DATASET] [--dataset-path DATASET_PATH]
@@ -206,7 +206,7 @@ and index search statistics CSV file in `<dataset-path/<dataset>/result/search/<
 
 
 ### Step 4: Plot Results
-The script `raft-ann-bench.plot` will plot results for all algorithms found in index search statistics
+The script `raft_ann_bench.plot` will plot results for all algorithms found in index search statistics
 CSV files `<dataset-path/<dataset>/result/search/*.csv`.
 
 The usage of this script is:
@@ -277,7 +277,7 @@ python -m raft_ann_bench.data_export --dataset deep-image-96-inner
 python -m raft_ann_bench.plot --dataset deep-image-96-inner
 ```
 
-Configuration files already exist for the following list of the million-scale datasets. Please refer to [ann-benchmarks datasets](https://github.com/erikbern/ann-benchmarks/#data-sets) for more information, including actual train and sizes. These all work out-of-the-box with the `--dataset` argument. Other million-scale datasets from `ann-benchmarks.com` will work, but will require a json configuration file to be created in `$CONDA_PREFIX/lib/python3.xx/site-packages/raft-ann-bench/run/conf`, or you can specify the `--configuration` option to use a specific file.
+Configuration files already exist for the following list of the million-scale datasets. Please refer to [ann-benchmarks datasets](https://github.com/erikbern/ann-benchmarks/#data-sets) for more information, including actual train and sizes. These all work out-of-the-box with the `--dataset` argument. Other million-scale datasets from `ann-benchmarks.com` will work, but will require a json configuration file to be created in `$CONDA_PREFIX/lib/python3.xx/site-packages/raft_ann_bench/run/conf`, or you can specify the `--configuration` option to use a specific file.
 
 | Dataset Name | Train Rows | Columns | Test Rows      | Distance   | 
 |-----|------------|----|----------------|------------|
@@ -293,7 +293,7 @@ All of the datasets above contain ground test datasets with 100 neighbors. Thus
 
 ### End to end: large-scale benchmarks (>10M vectors)
 
-`raft-ann-bench.get_dataset` cannot be used to download the [billion-scale datasets](ann_benchmarks_dataset.md#billion-scale)
+`raft_ann_bench.get_dataset` cannot be used to download the [billion-scale datasets](ann_benchmarks_dataset.md#billion-scale)
 due to their size. You should instead use our billion-scale datasets guide to download and prepare them.
 All other python commands mentioned below work as intended once the
 billion-scale dataset has been downloaded.
@@ -441,7 +441,7 @@ Note the following:
 
 A single configuration will often define a set of algorithms, with associated index and search parameters, that can be generalize across datasets. We use YAML to define dataset specific and algorithm specific configurations.
 
-<a id='yaml-dataset-config'></a>A default `datasets.yaml` is provided by RAFT in `${RAFT_HOME}/python/raft-ann-bench/src/raft-ann-bench/run/conf` with configurations available for several datasets. Here's a simple example entry for the `sift-128-euclidean` dataset:
+<a id='yaml-dataset-config'></a>A default `datasets.yaml` is provided by RAFT in `${RAFT_HOME}/python/raft-ann-bench/src/raft_ann_bench/run/conf` with configurations available for several datasets. Here's a simple example entry for the `sift-128-euclidean` dataset:
 
 ```yaml
 - name: sift-128-euclidean
@@ -452,7 +452,7 @@ A single configuration will often define a set of algorithms, with associated in
   distance: euclidean
 ```
 
-<a id='yaml-algo-config'></a>Configuration files for ANN algorithms supported by `raft-ann-bench` are provided in `${RAFT_HOME}/python/raft-ann-bench/src/raft-ann-bench/run/conf`. `raft_cagra` algorithm configuration looks like:
+<a id='yaml-algo-config'></a>Configuration files for ANN algorithms supported by `raft-ann-bench` are provided in `${RAFT_HOME}/python/raft-ann-bench/src/raft_ann_bench/run/conf`. `raft_cagra` algorithm configuration looks like:
 ```yaml
 name: raft_cagra
 groups:

@@ -17,7 +17,7 @@
 import importlib.resources
 
 __version__ = (
-    importlib.resources.files("raft-ann-bench")
+    importlib.resources.files("raft_ann_bench")
     .joinpath("VERSION")
     .read_text()
     .strip()

@@ -1,6 +1,6 @@
 name: hnswlib
 constraints:
-  search: raft-ann-bench.constraints.hnswlib_search_constraints
+  search: raft_ann_bench.constraints.hnswlib_search_constraints
 groups:
   base:
     build:

@@ -1,7 +1,7 @@
 name: raft_cagra
 constraints:
-  build: raft-ann-bench.constraints.raft_cagra_build_constraints
-  search: raft-ann-bench.constraints.raft_cagra_search_constraints
+  build: raft_ann_bench.constraints.raft_cagra_build_constraints
+  search: raft_ann_bench.constraints.raft_cagra_search_constraints
 groups:
   base:
     build:

@@ -1,6 +1,6 @@
 name: raft_cagra_hnswlib
 constraints:
-  search: raft-ann-bench.constraints.hnswlib_search_constraints
+  search: raft_ann_bench.constraints.hnswlib_search_constraints
 groups:
   base:
     build:

@@ -1,7 +1,7 @@
 name: raft_ivf_pq
 constraints:
-  build: raft-ann-bench.constraints.raft_ivf_pq_build_constraints
-  search: raft-ann-bench.constraints.raft_ivf_pq_search_constraints
+  build: raft_ann_bench.constraints.raft_ivf_pq_build_constraints
+  search: raft_ann_bench.constraints.raft_ivf_pq_search_constraints
 groups:
   base:
     build: