Skip to content

Commit

Permalink
Rename raft-ann-bench module to raft_ann_bench (#2333)
Browse files Browse the repository at this point in the history
Replace hyphens with underscores in `raft-ann-bench` to make it a valid Python identifier. Also add a Python 3.11 tag to `raft-ann-bench`, and use the `VERSION` file instead of an attribute.

Authors:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
  - Divye Gala (https://github.com/divyegala)
  - Mike Sarahan (https://github.com/msarahan)

URL: #2333
  • Loading branch information
KyleFromNVIDIA authored May 23, 2024
1 parent 64827fc commit 9c8d111
Show file tree
Hide file tree
Showing 48 changed files with 19 additions and 18 deletions.
6 changes: 3 additions & 3 deletions docs/source/ann_benchmarks_dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,12 +52,12 @@ If you have a dataset, but no corresponding ground truth file, then you can gene
```bash
# With existing query file
python -m raft-ann-bench.generate_groundtruth --dataset /dataset/base.fbin --output=groundtruth_dir --queries=/dataset/query.public.10K.fbin
python -m raft_ann_bench.generate_groundtruth --dataset /dataset/base.fbin --output=groundtruth_dir --queries=/dataset/query.public.10K.fbin
# With randomly generated queries
python -m raft-ann-bench.generate_groundtruth --dataset /dataset/base.fbin --output=groundtruth_dir --queries=random --n_queries=10000
python -m raft_ann_bench.generate_groundtruth --dataset /dataset/base.fbin --output=groundtruth_dir --queries=random --n_queries=10000
# Using only a subset of the dataset. Define queries by randomly
# selecting vectors from the (subset of the) dataset.
python -m raft-ann-bench.generate_groundtruth --dataset /dataset/base.fbin --nrows=2000000 --output=groundtruth_dir --queries=random-choice --n_queries=10000
python -m raft_ann_bench.generate_groundtruth --dataset /dataset/base.fbin --nrows=2000000 --output=groundtruth_dir --queries=random-choice --n_queries=10000
```
2 changes: 1 addition & 1 deletion docs/source/ann_benchmarks_low_level.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ cd raft

# (1) prepare a dataset
export PYTHONPATH=python/raft-ann-bench/src:$PYTHONPATH
python -m raft-ann-bench.get_dataset --dataset glove-100-angular --normalize
python -m raft_ann_bench.get_dataset --dataset glove-100-angular --normalize

# option --normalize is used here to normalize vectors so cosine distance is converted
# to inner product; don't use -n for l2 distance
Expand Down
20 changes: 10 additions & 10 deletions docs/source/raft_ann_benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -265,16 +265,16 @@ The steps below demonstrate how to download, install, and run benchmarks on a su
```bash
# (1) prepare dataset.
python -m raft-ann-bench.get_dataset --dataset deep-image-96-angular --normalize
python -m raft_ann_bench.get_dataset --dataset deep-image-96-angular --normalize
# (2) build and search index
python -m raft-ann-bench.run --dataset deep-image-96-inner --algorithms raft_cagra --batch-size 10 -k 10
python -m raft_ann_bench.run --dataset deep-image-96-inner --algorithms raft_cagra --batch-size 10 -k 10
# (3) export data
python -m raft-ann-bench.data_export --dataset deep-image-96-inner
python -m raft_ann_bench.data_export --dataset deep-image-96-inner
# (4) plot results
python -m raft-ann-bench.plot --dataset deep-image-96-inner
python -m raft_ann_bench.plot --dataset deep-image-96-inner
```
Configuration files already exist for the following list of the million-scale datasets. Please refer to [ann-benchmarks datasets](https://github.com/erikbern/ann-benchmarks/#data-sets) for more information, including actual train and sizes. These all work out-of-the-box with the `--dataset` argument. Other million-scale datasets from `ann-benchmarks.com` will work, but will require a json configuration file to be created in `$CONDA_PREFIX/lib/python3.xx/site-packages/raft-ann-bench/run/conf`, or you can specify the `--configuration` option to use a specific file.
Expand Down Expand Up @@ -308,20 +308,20 @@ mkdir -p datasets/deep-1B
# (1) prepare dataset
# download manually "Ground Truth" file of "Yandex DEEP"
# suppose the file name is deep_new_groundtruth.public.10K.bin
python -m raft-ann-bench.split_groundtruth --groundtruth datasets/deep-1B/deep_new_groundtruth.public.10K.bin
python -m raft_ann_bench.split_groundtruth --groundtruth datasets/deep-1B/deep_new_groundtruth.public.10K.bin
# two files 'groundtruth.neighbors.ibin' and 'groundtruth.distances.fbin' should be produced
# (2) build and search index
python -m raft-ann-bench.run --dataset deep-1B --algorithms raft_cagra --batch-size 10 -k 10
python -m raft_ann_bench.run --dataset deep-1B --algorithms raft_cagra --batch-size 10 -k 10
# (3) export data
python -m raft-ann-bench.data_export --dataset deep-1B
python -m raft_ann_bench.data_export --dataset deep-1B
# (4) plot results
python -m raft-ann-bench.plot --dataset deep-1B
python -m raft_ann_bench.plot --dataset deep-1B
```
The usage of `python -m raft-ann-bench.split_groundtruth` is:
The usage of `python -m raft_ann_bench.split_groundtruth` is:
```bash
usage: split_groundtruth.py [-h] --groundtruth GROUNDTRUTH
Expand Down Expand Up @@ -395,7 +395,7 @@ docker run --gpus all --rm -it -u $(id -u) \
This will drop you into a command line in the container, with the `raft-ann-bench` python package ready to use, as described in the [Running the benchmarks](#running-the-benchmarks) section above:
```
(base) root@00b068fbb862:/data/benchmarks# python -m raft-ann-bench.get_dataset --dataset deep-image-96-angular --normalize
(base) root@00b068fbb862:/data/benchmarks# python -m raft_ann_bench.get_dataset --dataset deep-image-96-angular --normalize
```
Additionally, the containers can be run in detached mode without any issue.
Expand Down
3 changes: 2 additions & 1 deletion python/raft-ann-bench/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ classifiers = [
"Programming Language :: Python",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
]

[project.urls]
Expand Down Expand Up @@ -59,4 +60,4 @@ skip = [
]

[tool.setuptools.dynamic]
version = { attr = "raft-ann-bench.__version__" }
version = { file = "raft_ann_bench/VERSION" }
Original file line number Diff line number Diff line change
Expand Up @@ -96,16 +96,16 @@ def main():
"The input and output files are in big-ann-benchmark's binary format.",
epilog="""Example usage
# With existing query file
python -m raft-ann-bench.generate_groundtruth --dataset /dataset/base.\
python -m raft_ann_bench.generate_groundtruth --dataset /dataset/base.\
fbin --output=groundtruth_dir --queries=/dataset/query.public.10K.fbin
# With randomly generated queries
python -m raft-ann-bench.generate_groundtruth --dataset /dataset/base.\
python -m raft_ann_bench.generate_groundtruth --dataset /dataset/base.\
fbin --output=groundtruth_dir --queries=random --n_queries=10000
# Using only a subset of the dataset. Define queries by randomly
# selecting vectors from the (subset of the) dataset.
python -m raft-ann-bench.generate_groundtruth --dataset /dataset/base.\
python -m raft_ann_bench.generate_groundtruth --dataset /dataset/base.\
fbin --nrows=2000000 --cols=128 --output=groundtruth_dir \
--queries=random-choice --n_queries=10000
""",
Expand Down

0 comments on commit 9c8d111

Please sign in to comment.