Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More README updates [skip-ci] #467

Merged
merged 11 commits into from
Feb 5, 2022
111 changes: 96 additions & 15 deletions BUILD.md
Original file line number Diff line number Diff line change
@@ -1,63 +1,144 @@
# RAFT Build and Development Guide

- [Building and installing RAFT](#build_install)
- [CUDA/GPU Requirements](#cuda_gpu_req)
- [Header-only C++](#nstall_header_only_cpp)
- [C++ Shared Libraries](#shared_cpp_libs)
- [Googletests](#gtests)
- [C++ Using Cmake](#cpp_using_cmake)
- [Python](#python)
- [Using RAFT in downstream projects](#use_raft)
- [C++ Integration](#cxx_integration)
- [Cmake Header-only Integration](#cxx_integration)
- [Using Shared Libraries in Cmake](#use_shared_libs)
- [Building RAFT C++ from source](#build_cxx_source)
- [Python/Cython Integration](#py_integration)

## <a id="build_install"></a>Building and installing RAFT

### CUDA/GPU Requirements
### <a id="cuda_gpu_req"></a>CUDA/GPU Requirements
- CUDA 11.0+
- NVIDIA driver 450.80.02+
- Pascal architecture of better (Compute capability >= 6.0)

C++ RAFT is a header-only library but provides the option of building shared libraries with template instantiations for common types to speed up compile times for larger projects. The recommended way to build and install RAFT is to use the `build.sh` script in the root of the repository. This script can build both the C++ and Python code and provides options for building and installing the shared libraries.
C++ RAFT is a header-only library but provides the option of building shared libraries with template instantiations for common types to speed up compile times for larger projects.

To run C++ tests:
The recommended way to build and install RAFT is to use the `build.sh` script in the root of the repository. This script can build both the C++ and Python code and provides options for building and installing the headers, Googletests, and individual shared libraries.

### <a id="install_header_only_cpp"></a>Header-only C++

RAFT depends on many different core libraries such as `thrust`, `cub`, `cucollections`, and `rmm`, which will be downloaded automatically by `cmake` even when only installing the headers. It's important to note that while all the headers will be installed and available, some parts of the RAFT API depend on libraries like `FAISS`, which can also be downloaded in the RAFT build but will need to be told to do so.

The following example builds and installs raft in header-only mode:
```bash
./test_raft
./build.sh libraft --nogtest
```

To run Python tests, if `install` setup.py target is not run:
###<a id="shared_cpp_libs"></a>C++ Shared Libraries (optional)

Shared libraries are provided to speed up compile times for larger libraries which may heavily utilize some of the APIs. These shared libraries can also significantly improve re-compile times while developing against the APIs.

Build all the shared libraries by passing `--compile-libs` flag to `build.sh`:

```bash
cd python
python -m pytest raft
./build.sh libraft --compile-libs --nogtest
```

To remain flexible, the individual shared libraries have their own flags and multiple can be used (though currently only the `nn` and `distance` packages contain shared libraries):
```bash
./build.sh libraft --compile-nn --compile-dist --nogtest
```

To build manually, you can also use `CMake` and setup.py directly.
###<a id="gtests"></a>Googletests

For C++, the `RAFT_COMPILE_LIBRARIES` option can be used to compile the shared libraries. Shared libraries are provided for the `nn` and `distance` packages currently. The `nn` package requires FAISS, which will be built from source if it is not already installed. [FAISS](https://github.com/facebookresearch/faiss) can optionally be statically compiled into the `nn` shared library with the `RAFT_USE_FAISS_STATIC` option.
Compile the Googletests by removing the `--nogtest` flag from `build.sh`:
```bash
./build.sh libraft --compile-nn --compile-dist
```

To run C++ tests:

```bash
./test_raft
```

### <a id="cpp_using_cmake"></a>C++ Using Cmake

To install RAFT into a specific location, use `CMAKE_INSTALL_PREFIX`. The snippet below will install it into the current conda environment.
```bash
cd cpp
mkdir build
cd build
cmake -DRAFT_COMPILE_LIBRARIES=ON -DRAFT_USE_FAISS_STATIC=OFF -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX ../
cmake -D BUILD_TESTS=ON -DRAFT_COMPILE_LIBRARIES=ON -DRAFT_ENABLE_NN_DEPENDENCIES=ON -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX ../
make install
```

For python:

RAFT's cmake has the following configurable flags available:.

| Flag | Possible Values | Default Value | Behavior |
| --- | --- | --- | --- |
| BUILD_TESTS | ON, OFF | ON | Compile Googletests |
| RAFT_COMPILE_LIBRARIES | ON, OFF | OFF | Compiles all `libraft` shared libraries (these are required for Googletests) |
| RAFT_COMPILE_NN_LIBRARY | ON, OFF | ON | Compiles the `libraft-nn` shared library |
| RAFT_COMPILE_DIST_LIBRARY | ON, OFF | ON | Compiles the `libraft-distance` shared library |
| RAFT_ENABLE_NN_DEPENDENCIES | ON, OFF | OFF | Searches for dependencies of nearest neighbors API, such as FAISS, and compiles them if not found. |
| RAFT_USE_FAISS_STATIC | ON, OFF | OFF | Statically link FAISS into `libraft-nn` |
| DETECT_CONDA_ENV | ON, OFF | ON | Enable detection of conda environment for dependencies |
| NVTX | ON, OFF | OFF | Enable NVTX Markers |
| CUDA_ENABLE_KERNELINFO | ON, OFF | OFF | Enables `kernelinfo` in nvcc. This is useful for `compute-sanitizer` |
| CUDA_ENABLE_LINEINFO | ON, OFF | OFF | Enable the -lineinfo option for nvcc |
| CUDA_STATIC_RUNTIME | ON, OFF | OFF | Statically link the CUDA runtime |

Shared libraries are provided for the `libraft-nn` and `libraft-distance` components currently. The `libraft-nn` component depends upon [FAISS](https://github.com/facebookresearch/faiss) and the `RAFT_ENABLE_NN_DEPENDENCIES` option will build it from source if it is not already installed.



### <a id="python"></a>Python

Conda environment scripts are provided for installing the necessary dependencies for building and using the Python APIs. It is preferred to use `mamba`, as it provides significant speedup over `conda`. The following example will install create and install dependencies for a CUDA 11.5 conda environment:

```bash
conda env create --name raft_env -f conda/environments/raft_dev_cuda11.5.yml
```

The Python API can be built using the `build.sh` script:

```bash
./build.sh pyraft
```

`setup.py` can also be used to build the Python API manually:
```bash
cd python
python setup.py build_ext --inplace
python setup.py install
```

To run the Python tests:
```bash
cd python
python -m pytest raft
```

## <a id="use_raft"></a>Using RAFT in downstream projects

### <a id="cxx_integration"></a>C++ Integration
### <a id="cxx_integration"></a>C++ header-only integration using cmake

Use RAFT in cmake projects with `find_package(raft)` for header-only operation and the `raft::raft` target will be available for configuring linking and `RAFT_INCLUDE_DIR` will be available for includes. Note that if any packages are used which require downstream dependencies, such as the `libraft-nn` package requiring FAISS, these dependencies will have be installed and configured in cmake independently.

Use RAFT in cmake projects with `find_package(raft)` for header-only operation and the `raft::raft` target will be available for configuring linking and `RAFT_INCLUDE_DIR` will be available for includes. Note that if any packages are used which require downstream dependencies, such as the `nn` package requiring FAISS, these dependencies will have be installed and configured in cmake independently.
### <a id="use_shared_libs"></a>Using pre-compiled shared libraries

Use `find_package(raft COMPONENTS nn, distance)` to enable the shared libraries and pass dependencies through separate targets for each component. In this example, `raft::distance` and `raft::nn` targets will be available for configuring linking paths. These targets will also pass through any transitive dependencies (such as FAISS in the case of the `nn` package).

### <a id="build_cxx_source"></a>Building RAFT C++ from source
The pre-compiled libraries contain template specializations for commonly used types and require the additional include of header files with `extern template` definitions that tell the compiler not to instantiate templates that are already contained in the shared libraries. By convention, these header files are named `spectializations.hpp` and located in the base directory for the packages that contain specializations.

The following example shows how to use the `libraft-distance` API with the pre-compiled specializations:
```c++
#include <raft/distance/distance.hpp>
#include <raft/distance/specializations.hpp>
```

### <a id="build_cxx_source"></a>Building RAFT C++ from source in cmake

RAFT uses the [RAPIDS cmake](https://github.com/rapidsai/rapids-cmake) library, so it can be easily included into downstream projects. RAPIDS cmake provides a convenience layer around the [Cmake Package Manager (CPM)](https://github.com/cpm-cmake/CPM.cmake). The following example is similar to building RAFT itself from source but allows it to be done in cmake, providing the `raft::raft` link target and `RAFT_INCLUDE_DIR` for includes. The `COMPILE_LIBRARIES` option enables the building of the shared libraries

Expand Down
26 changes: 16 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,33 @@
# <div align="left"><img src="https://rapids.ai/assets/images/rapids_logo.png" width="90px"/>&nbsp;RAFT: RAPIDS Analytics Framework Toolkit</div>

RAFT is a [Scipy-like](https://scipy.org/) library for scientific computing, containing CUDA-accelerated building-blocks for rapidly composing analytics in the [RAPIDS](https://rapids.ai) ecosystem. These building-blocks include infrastructure as well as mathematical computational primitives, which accelerate the development of algorithms for data science applications.
RAFT contains fundamental widely-used algorithms and primitives for data science, graph and machine learning. The algorithms are CUDA-accelerated and form building-blocks for rapidly composing analytics in the [RAPIDS](https://rapids.ai) ecosystem.

By taking a primitives-based approach to algorithm development, RAFT
1. accelerates algorithm construction time
2. reduces the maintenance burden by maximizing reuse across projects, and
3. centralizes the core computations, allowing future optimizations to benefit all algorithms that use them.

RAFT provides a header-only C++ API (with optional shared libraries to accelerate build time) that cover the following general categories:
At its core, RAFT is a header-only C++ library with optional shared libraries that span the following categories:

#####
| Category | Description / Examples |
| Category | Examples |
| --- | --- |
| **Data Formats** | sparse & dense, conversions, and data generations |
| **Data Formats** | sparse & dense, conversions, data generation |
| **Data Generation** | sparse, spatial, machine learning datasets |
| **Dense Linear Algebra** | matrix arithmetic, norms, factorization |
| **Dense Linear Algebra** | matrix arithmetic, norms, factorization, least squares, svd & eigenvalue problems |
| **Spatial** | pairwise distances, nearest neighbors, neighborhood graph construction |
| **Sparse Operations** | linear algebra, slicing, symmetrization, norms, spectral embedding, msf |
| **Sparse Operations** | linear algebra, eigenvalue problems, slicing, symmetrization, connected component labeling |
| **Basic Clustering** | spectral clustering, hierarchical clustering, k-means |
| **Optimizers** | eigenvalue decomposition, least squares, and lanczos |
| **Statistics** | sampling, moments, metrics |
| **Combinatorial Optimization** | linear assignment problem, minimum spanning forest |
| **Iterative Solvers** | lanczos |
| **Statistics** | sampling, moments and summary statistics, metrics |
| **Distributed Tools** | multi-node multi-gpu infrastructure |

RAFT also provides a Python API that enables the building of multi-node multi-GPU algorithms in the [Dask](https://dask.org/) ecosystem. We are continuing to improve the coverage of the Python API to expose the building-blocks from the categories above.
RAFT also provides a Python library that includes
1. a python wrapper around the `raft::handle_t` for managing cuda library resources
2. building multi-node multi-GPU algorithms that leverage [Dask](https://dask.org/)

We are continuing to improve the Python API by exposing the core algorithms and primitives from the categories above.

## Getting started

Expand Down Expand Up @@ -71,9 +76,10 @@ The folder structure mirrors other RAPIDS repos (cuDF, cuML, cuGraph...), with t
- `ci`: Scripts for running CI in PRs
- `conda`: Conda recipes and development conda environments
- `cpp`: Source code for all C++ code.
- `docs`: Doxygen configuration
- `include`: The C++ API is fully-contained here
- `src`: Compiled template specializations for the shared libraries
- `docs`: Source code and scripts for building library documentation
- `docs`: Source code and scripts for building library documentation (doxygen + pydocs)
- `python`: Source code for all Python source code.

## Contributing
Expand Down