rapidsai · cjnolet · Dec 13, 2022 · Dec 13, 2022 · Dec 13, 2022
diff --git a/README.md b/README.md
@@ -37,7 +37,7 @@ While not exhaustive, the following general categories help summarize the accele
 All of RAFT's C++ APIs can be accessed header-only and optional pre-compiled shared libraries can 1) speed up compile times and 2) enable the APIs to be used without CUDA-enabled compilers.
 
 In addition to the C++ library, RAFT also provides 2 Python libraries:
-- `pylibraft` - lightweight low-level Python wrappers around RAFT's host-accessible APIs.
+- `pylibraft` - lightweight low-level Python wrappers around RAFT's host-accessible "runtime" APIs.
 - `raft-dask` - multi-node multi-GPU communicator infrastructure for building distributed algorithms on the GPU with Dask.
 
 ## Getting started
@@ -142,7 +142,7 @@ in2 = cp.random.random_sample((n_samples, n_features), dtype=cp.float32)
 output = pairwise_distance(in1, in2, metric="euclidean")
 ```
 
-The `output` array supports [__cuda_array_interface__](https://numba.pydata.org/numba-doc/dev/cuda/cuda_array_interface.html#cuda-array-interface-version-2) so it is interoperable with other libraries like CuPy, Numba, and PyTorch that also support it. 
+The `output` array in the above example is of type `raft.common.device_ndarray`, which supports [__cuda_array_interface__](https://numba.pydata.org/numba-doc/dev/cuda/cuda_array_interface.html#cuda-array-interface-version-2) making it interoperable with other libraries like CuPy, Numba, and PyTorch that also support it. CuPy supports DLPack, which also enables zero-copy conversion from `raft.common.device_ndarray` to JAX and Tensorflow.
 
 Below is an example of converting the output `pylibraft.device_ndarray` to a CuPy array:
 ```python

diff --git a/docs/source/quick_start.md b/docs/source/quick_start.md
@@ -8,9 +8,9 @@ RAFT relies heavily on the [RMM](https://github.com/rapidsai/rmm) library which
 
 ## Multi-dimensional Spans and Arrays
 
-The APIs in RAFT currently accept raw pointers to device memory and we are in the process of simplifying the APIs with the [mdspan](https://arxiv.org/abs/2010.06474) multi-dimensional array view for representing data in higher dimensions similar to the `ndarray` in the Numpy Python library. RAFT also contains the corresponding owning `mdarray` structure, which simplifies the allocation and management of multi-dimensional data in both host and device (GPU) memory.
+Most of the APIs in RAFT accept  [mdspan](https://arxiv.org/abs/2010.06474) multi-dimensional array view for representing data in higher dimensions similar to the `ndarray` in the Numpy Python library. RAFT also contains the corresponding owning `mdarray` structure, which simplifies the allocation and management of multi-dimensional data in both host and device (GPU) memory.
 
-The `mdarray` forms a convenience layer over RMM and can be constructed in RAFT using a number of different helper functions:
+The `mdarray` is an owning object that forms a convenience layer over RMM and can be constructed in RAFT using a number of different helper functions:
 
 ```c++
 #include <raft/core/device_mdarray.hpp>
@@ -118,11 +118,11 @@ auto metric = raft::distance::DistanceType::L2SqrtExpanded;
 raft::distance::pairwise_distance(handle, input.view(), input.view(), output.view(), metric);
 ```
 
-## Python Example
+### Python Example
 
-The `pylibraft` package contains a Python API for RAFT algorithms and primitives. `pylibraft` integrates nicely into other libraries by being very lightweight with minimal dependencies and accepting any object that supports the `__cuda_array_interface__`, such as [CuPy's ndarray](https://docs.cupy.dev/en/stable/user_guide/interoperability.html#rmm). The package is currently limited to pairwise distances and RMAT graph generation, but we will continue adding more in future releases.
+The `pylibraft` package contains a Python API for RAFT algorithms and primitives. `pylibraft` integrates nicely into other libraries by being very lightweight with minimal dependencies and accepting any object that supports the `__cuda_array_interface__`, such as [CuPy's ndarray](https://docs.cupy.dev/en/stable/user_guide/interoperability.html#rmm). The number of RAFT algorithms exposed in this package is continuing to grow from release to release.
 
-The example below demonstrates computing the pairwise Euclidean distances between CuPy arrays. `pylibraft` is a low-level API that prioritizes efficiency and simplicity over being pythonic, which is shown here by pre-allocating the output memory before invoking the `pairwise_distance` function. Note that CuPy is not a required dependency for `pylibraft`.
+The example below demonstrates computing the pairwise Euclidean distances between CuPy arrays. Note that CuPy is not a required dependency for `pylibraft`.
 
 ```python
 import cupy as cp
@@ -137,3 +137,34 @@ in2 = cp.random.random_sample((n_samples, n_features), dtype=cp.float32)
 
 output = pairwise_distance(in1, in2, metric="euclidean")
 ```
+
+The `output` array in the above example is of type `raft.common.device_ndarray`, which supports [__cuda_array_interface__](https://numba.pydata.org/numba-doc/dev/cuda/cuda_array_interface.html#cuda-array-interface-version-2) making it interoperable with other libraries like CuPy, Numba, and PyTorch that also support it. CuPy supports DLPack, which also enables zero-copy conversion from `raft.common.device_ndarray` to JAX and Tensorflow.
+
+Below is an example of converting the output `pylibraft.common.device_ndarray` to a CuPy array:
+```python
+cupy_array = cp.asarray(output)
+```
+
+And converting to a PyTorch tensor:
+```python
+import torch
+
+torch_tensor = torch.as_tensor(output, device='cuda')
+```
+
+`pylibraft` also supports writing to a pre-allocated output array so any `__cuda_array_interface__` supported array can be written to in-place:
+
+```python
+import cupy as cp
+
+from pylibraft.distance import pairwise_distance
+
+n_samples = 5000
+n_features = 50
+
+in1 = cp.random.random_sample((n_samples, n_features), dtype=cp.float32)
+in2 = cp.random.random_sample((n_samples, n_features), dtype=cp.float32)
+output = cp.empty((n_samples, n_samples), dtype=cp.float32)
+
+pairwise_distance(in1, in2, out=output, metric="euclidean")
+```
@@ -14,6 +14,7 @@
 #
 
 from pylibraft._version import get_versions
+from pylibraft.config import config
 
 __version__ = get_versions()["version"]
 del get_versions
@@ -45,8 +45,11 @@ from pylibraft.common.cpp.mdspan cimport *
 from pylibraft.common.cpp.optional cimport optional
 from pylibraft.common.handle cimport handle_t
 
+from pylibraft.common import auto_convert_output
+
 
 @auto_sync_handle
+@auto_convert_output
 def compute_new_centroids(X,
                           centroids,
                           labels,
@@ -197,6 +200,7 @@ def compute_new_centroids(X,
 
 
 @auto_sync_handle
+@auto_convert_output
 def cluster_cost(X, centroids, handle=None):
     """
     Compute cluster cost given an input matrix and existing centroids
@@ -403,6 +407,7 @@ FitOutput = namedtuple("FitOutput", "centroids inertia n_iter")
 
 
 @auto_sync_handle
+@auto_convert_output
 def fit(
     KMeansParams params, X, centroids=None, sample_weights=None, handle=None
 ):

@@ -17,3 +17,4 @@
 from .cuda import Stream
 from .device_ndarray import device_ndarray
 from .handle import Handle
+from .outputs import auto_convert_output
@@ -0,0 +1,93 @@
+# Copyright (c) 2022, NVIDIA CORPORATION.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import functools
+import warnings
+
+import pylibraft.config
+
+
+def import_warn_(lib):
+    warnings.warn(
+        "%s is not available and output cannot be converted."
+        "Returning original output instead." % lib
+    )
+
+
+def convert_to_torch(device_ndarray):
+    try:
+        import torch
+
+        return torch.as_tensor(device_ndarray, device="cuda")
+    except ImportError:
+        import_warn_("PyTorch")
+        return device_ndarray
+
+
+def convert_to_cupy(device_ndarray):
+    try:
+        import cupy
+
+        return cupy.asarray(device_ndarray)
+    except ImportError:
+        import_warn_("CuPy")
+        return device_ndarray
+
+
+def no_conversion(device_ndarray):
+    return device_ndarray
+
+
+def convert_to_cai_type(device_ndarray):
+    output_as_ = pylibraft.config.output_as_
+    if callable(output_as_):
+        return output_as_(device_ndarray)
+    elif output_as_ == "raft":
+        return device_ndarray
+    elif output_as_ == "torch":
+        return convert_to_torch(device_ndarray)
+    elif output_as_ == "cupy":
+        return convert_to_cupy(device_ndarray)
+    else:
+        raise ValueError("No valid type conversion found for %s" % output_as_)
+
+
+def conv(ret):
+    for i in ret:
+        if isinstance(i, pylibraft.common.device_ndarray):
+            yield convert_to_cai_type(i)
+        else:
+            yield i
+
+
+def auto_convert_output(f):
+    """Decorator to automatically convert an output device_ndarray
+    (or list or tuple of device_ndarray) into the configured
+    `__cuda_array_interface__` compliant type.
+    """
+
+    @functools.wraps(f)
+    def wrapper(*args, **kwargs):
+        ret_value = f(*args, **kwargs)
+        if isinstance(ret_value, pylibraft.common.device_ndarray):
+            return convert_to_cai_type(ret_value)
+        elif isinstance(ret_value, tuple):
+            return tuple(conv(ret_value))
+        elif isinstance(ret_value, list):
+            return list(conv(ret_value))
+        else:
+            return ret_value
+
+    return wrapper
@@ -0,0 +1,38 @@
+# Copyright (c) 2022, NVIDIA CORPORATION.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+SUPPORTED_OUTPUT_TYPES = ["torch", "cupy", "raft"]
+
+
+class config:
+    output_as_ = "raft"  # By default, return device_ndarray from functions
+
+    @classmethod
+    def set_output_as(self, output):
+        """
+        RAFT functions which normally return outputs with memory on device will
+        instead automatically convert the output to the specified output type,
+        depending on availability of the requested type.
+
+        Parameters
+        ----------
+        output : str or callable. str can be either
+                 { "raft", "cupy", or "torch" }.
+                 default = "raft". callable should accept
+                 pylibraft.common.device_ndarray
+                 as a single argument and return the converted type.
+        """
+        if output not in SUPPORTED_OUTPUT_TYPES and not callable(output):
+            raise ValueError("Unsupported output option " % output)
+        config.output_as_ = output
@@ -26,7 +26,12 @@ from libcpp cimport bool
 
 from .distance_type cimport DistanceType
 
-from pylibraft.common import Handle, cai_wrapper, device_ndarray
+from pylibraft.common import (
+    Handle,
+    auto_convert_output,
+    cai_wrapper,
+    device_ndarray,
+)
 from pylibraft.common.handle import auto_sync_handle
 
 from pylibraft.common.handle cimport handle_t
@@ -57,6 +62,7 @@ cdef extern from "raft_runtime/distance/fused_l2_nn.hpp" \
 
 
 @auto_sync_handle
+@auto_convert_output
 def fused_l2_nn_argmin(X, Y, out=None, sqrt=True, handle=None):
     """
     Compute the 1-nearest neighbors between X and Y using the L2 distance

@@ -31,7 +31,7 @@ from pylibraft.common.handle import auto_sync_handle
 
 from pylibraft.common.handle cimport handle_t
 
-from pylibraft.common import cai_wrapper, device_ndarray
+from pylibraft.common import auto_convert_output, cai_wrapper, device_ndarray
 
 
 cdef extern from "raft_runtime/distance/pairwise_distance.hpp" \
@@ -89,6 +89,7 @@ SUPPORTED_DISTANCES = ["euclidean", "l1", "cityblock", "l2", "inner_product",
 
 
 @auto_sync_handle
+@auto_convert_output
 def distance(X, Y, out=None, metric="euclidean", p=2.0, handle=None):
     """
     Compute pairwise distances between X and Y

@@ -33,7 +33,12 @@ from libcpp cimport bool, nullptr
 
 from pylibraft.distance.distance_type cimport DistanceType
 
-from pylibraft.common import Handle, cai_wrapper, device_ndarray
+from pylibraft.common import (
+    Handle,
+    auto_convert_output,
+    cai_wrapper,
+    device_ndarray,
+)
 from pylibraft.common.interruptible import cuda_interruptible
 
 from pylibraft.common.handle cimport handle_t
@@ -302,6 +307,7 @@ cdef class Index:
 
 
 @auto_sync_handle
+@auto_convert_output
 def build(IndexParams index_params, dataset, handle=None):
     """
     Builds an IVF-PQ index that can be later used for nearest neighbor search.
@@ -401,6 +407,7 @@ def build(IndexParams index_params, dataset, handle=None):
 
 
 @auto_sync_handle
+@auto_convert_output
 def extend(Index index, new_vectors, new_indices, handle=None):
     """
     Extend an existing index with new vectors.
@@ -565,6 +572,7 @@ cdef class SearchParams:
 
 
 @auto_sync_handle
+@auto_convert_output
 def search(SearchParams search_params,
            Index index,
            queries,

@@ -33,7 +33,12 @@ from libcpp cimport bool, nullptr
 
 from pylibraft.distance.distance_type cimport DistanceType
 
-from pylibraft.common import Handle, cai_wrapper, device_ndarray
+from pylibraft.common import (
+    Handle,
+    auto_convert_output,
+    cai_wrapper,
+    device_ndarray,
+)
 
 from pylibraft.common.handle cimport handle_t
 
@@ -208,6 +213,7 @@ cdef host_matrix_view[int8_t, uint64_t, row_major] \
 
 
 @auto_sync_handle
+@auto_convert_output
 def refine(dataset, queries, candidates, k=None, indices=None, distances=None,
            metric="l2_expanded", handle=None):
     """