AdANNS-DiskANN is a variant of DiskANN, a web-scale graph-based ANNS index capable of serving queries from both RAM and Disk (cheap SSDs).
We provide a self-contained pipeline in adanns-diskann.ipynb which requires a build of DiskANN provided in the original codebase and summarized below:
sudo apt install make cmake g++ libaio-dev libgoogle-perftools-dev clang-format \
libboost-all-dev libmkl-full-dev
mkdir build && cd build && cmake -DCMAKE_BUILD_TYPE=Release .. && make -j
adanns-diskann.ipynb is broadly organized as:
- Data preprocessing: convert MR or RR embeddings (fp32
np.ndarray
) to binary format - Generate exact-search "ground truth" used for k-recall@N
- Build In-Memory or SSD DiskANN index on MR or RR
- Search the built indices to generate k-NN arrays
- Evaluate the k-NN arrays with and without reranking