Skip to content

Commit

Permalink
Update RAFT documentation (#1717)
Browse files Browse the repository at this point in the history
- Various documentation updates on C++ and Python doc, mainly for raft::neighbors
- Add QPS vs Recall plot

Authors:
  - Micka (https://github.com/lowener)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1717
  • Loading branch information
lowener authored Aug 10, 2023
1 parent 17335e7 commit f49d8a2
Show file tree
Hide file tree
Showing 13 changed files with 293 additions and 357 deletions.
10 changes: 10 additions & 0 deletions docs/source/cpp_api/neighbors_cagra.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,13 @@ namespace *raft::neighbors::cagra*
:content-only:


Serializer Methods
------------------
``#include <raft/neighbors/cagra_serialize.cuh>``

namespace *raft::neighbors::cagra*

.. doxygengroup:: cagra_serialize
:project: RAFT
:members:
:content-only:
11 changes: 11 additions & 0 deletions docs/source/cpp_api/neighbors_ivf_pq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,17 @@ Serializer Methods
namespace *raft::neighbors::ivf_pq*

.. doxygengroup:: ivf_pq_serialize
:project: RAFT
:members:
:content-only:

Candidate Refinement
--------------------
``#include <raft/neighbors/refine.cuh>``

namespace *raft::neighbors*

.. doxygengroup:: ann_refine
:project: RAFT
:members:
:content-only:
9 changes: 4 additions & 5 deletions docs/source/pylibraft_api/cluster.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,14 @@ This page provides pylibraft class references for the publicly-exposed elements
:language: python
:class: highlight

KMeans
######

.. autoclass:: pylibraft.cluster.kmeans.KMeansParams
:members:

.. autofunction:: pylibraft.cluster.kmeans.fit

.. autofunction:: pylibraft.cluster.kmeans.cluster_cost

.. autofunction:: pylibraft.cluster.compute_new_centroids




.. autofunction:: pylibraft.cluster.kmeans.compute_new_centroids
19 changes: 18 additions & 1 deletion docs/source/pylibraft_api/neighbors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,11 @@ CAGRA

.. autofunction:: pylibraft.neighbors.cagra.search

Serializer Methods
------------------
.. autofunction:: pylibraft.neighbors.cagra.save

.. autofunction:: pylibraft.neighbors.cagra.load

IVF-Flat
########
Expand All @@ -43,6 +48,12 @@ IVF-Flat

.. autofunction:: pylibraft.neighbors.ivf_flat.search

Serializer Methods
------------------

.. autofunction:: pylibraft.neighbors.ivf_flat.save

.. autofunction:: pylibraft.neighbors.ivf_flat.load

IVF-PQ
######
Expand All @@ -59,8 +70,14 @@ IVF-PQ

.. autofunction:: pylibraft.neighbors.ivf_pq.search

Serializer Methods
------------------

.. autofunction:: pylibraft.neighbors.ivf_pq.save

.. autofunction:: pylibraft.neighbors.ivf_pq.load

Candidate Refinement
####################
--------------------

.. autofunction:: pylibraft.neighbors.refine
4 changes: 4 additions & 0 deletions docs/source/raft_ann_benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,10 @@ options:
All algorithms present in the CSV file supplied to this script with parameter `result_csv`
will appear in the plot.
The figure below is the resulting plot of running our benchmarks as of August 2023 for a batch size of 10, on an NVIDIA H100 GPU and an Intel Xeon Platinum 8480CL CPU. It presents the throughput (in Queries-Per-Second) performance for every level of recall.
![Throughput vs recall plot comparing popular ANN algorithms with RAFT's at batch size 10](../../img/raft-vector-search-batch-10.png)
## Adding a new ANN algorithm
### Implementation and Configuration
Implementation of a new algorithm should be a C++ class that inherits `class ANN` (defined in `cpp/bench/ann/src/ann.h`) and implements all the pure virtual functions.
Expand Down
Binary file added img/raft-vector-search-batch-10.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
21 changes: 2 additions & 19 deletions python/pylibraft/pylibraft/cluster/kmeans.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -85,33 +85,26 @@ def compute_new_centroids(X,
--------
>>> import cupy as cp
>>> from pylibraft.common import Handle
>>> from pylibraft.cluster.kmeans import compute_new_centroids
>>> # A single RAFT handle can optionally be reused across
>>> # pylibraft functions.
>>> handle = Handle()
>>> n_samples = 5000
>>> n_features = 50
>>> n_clusters = 3
>>> X = cp.random.random_sample((n_samples, n_features),
... dtype=cp.float32)
>>> centroids = cp.random.random_sample((n_clusters, n_features),
... dtype=cp.float32)
...
>>> labels = cp.random.randint(0, high=n_clusters, size=n_samples,
... dtype=cp.int32)
>>> new_centroids = cp.empty((n_clusters, n_features), dtype=cp.float32)
>>> new_centroids = cp.empty((n_clusters, n_features),
... dtype=cp.float32)
>>> compute_new_centroids(
... X, centroids, labels, new_centroids, handle=handle
... )
>>> # pylibraft functions are often asynchronous so the
>>> # handle needs to be explicitly synchronized
>>> handle.sync()
Expand Down Expand Up @@ -221,11 +214,9 @@ def init_plus_plus(X, n_clusters=None, seed=None, handle=None, centroids=None):
>>> import cupy as cp
>>> from pylibraft.cluster.kmeans import init_plus_plus
>>> n_samples = 5000
>>> n_features = 50
>>> n_clusters = 3
>>> X = cp.random.random_sample((n_samples, n_features),
... dtype=cp.float32)
Expand Down Expand Up @@ -301,19 +292,14 @@ def cluster_cost(X, centroids, handle=None):
--------
>>> import cupy as cp
>>>
>>> from pylibraft.cluster.kmeans import cluster_cost
>>>
>>> n_samples = 5000
>>> n_features = 50
>>> n_clusters = 3
>>>
>>> X = cp.random.random_sample((n_samples, n_features),
... dtype=cp.float32)
>>> centroids = cp.random.random_sample((n_clusters, n_features),
... dtype=cp.float32)
>>> inertia = cluster_cost(X, centroids)
"""
x_cai = X.__cuda_array_interface__
Expand Down Expand Up @@ -524,13 +510,10 @@ def fit(
--------
>>> import cupy as cp
>>>
>>> from pylibraft.cluster.kmeans import fit, KMeansParams
>>>
>>> n_samples = 5000
>>> n_features = 50
>>> n_clusters = 3
>>>
>>> X = cp.random.random_sample((n_samples, n_features),
... dtype=cp.float32)
Expand Down
4 changes: 2 additions & 2 deletions python/pylibraft/pylibraft/common/handle.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -197,8 +197,8 @@ cdef class Handle(DeviceResources):


_HANDLE_PARAM_DOCSTRING = """
handle : Optional RAFT resource handle for reusing expensive CUDA
resources. If a handle isn't supplied, CUDA resources will be
handle : Optional RAFT resource handle for reusing CUDA resources.
If a handle isn't supplied, CUDA resources will be
allocated inside this function and synchronized before the
function exits. If a handle is supplied, you will need to
explicitly synchronize yourself by calling `handle.sync()`
Expand Down
5 changes: 0 additions & 5 deletions python/pylibraft/pylibraft/neighbors/brute_force.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,6 @@ def knn(dataset, queries, k=None, indices=None, distances=None,
distances : Optional array interface compliant matrix shape
(n_queries, k), dtype float. If supplied, neighbor
indices will be written here in-place. (default None)
{handle_docstring}
Returns
Expand All @@ -108,16 +107,12 @@ def knn(dataset, queries, k=None, indices=None, distances=None,
Examples
--------
>>> import cupy as cp
>>> from pylibraft.common import DeviceResources
>>> from pylibraft.neighbors.brute_force import knn
>>> n_samples = 50000
>>> n_features = 50
>>> n_queries = 1000
>>> dataset = cp.random.random_sample((n_samples, n_features),
... dtype=cp.float32)
>>> # Search using the built index
Expand Down
Loading

0 comments on commit f49d8a2

Please sign in to comment.