Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to Python and C++ Docs #442

Merged
merged 12 commits into from
Jan 23, 2022
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
158 changes: 55 additions & 103 deletions BUILD.md
Original file line number Diff line number Diff line change
@@ -1,78 +1,17 @@
# RAFT Build and Development Guide

- [Building and running tests](#building-and-running-tests)
- [Usage of RAFT by downstream projects](#usage-of-raft-by-downstream-projects)
- [C++ Integration](#c-integration)
- [Python/Cython Integration](#pythoncython-integration)
- [Building and running tests](#building-and-running-tests)
- [CI Process](#ci-process)
- [Developer Guide](#developer-guide)
- [Local Development](#local-development)
- [Submitting PRs](#submitting-prs)

## Building and installing RAFT


## Usage of RAFT by downstream projects

### C++ Integration

C++ RAFT is a header only library, so it can be easily configured using CMake by consuming libraries. Since this repo is intended to be included by downstream repos, the recommended way of accomplishing that is using CMake's git cloning functionality:


```cmake
if(DEFINED ENV{RAFT_PATH})
message(STATUS "RAFT_PATH environment variable detected.")
message(STATUS "RAFT_DIR set to $ENV{RAFT_PATH}")
set(RAFT_DIR ENV{RAFT_PATH})

else(DEFINED ENV{RAFT_PATH})
message(STATUS "RAFT_PATH environment variable NOT detected, cloning RAFT")
set(RAFT_GIT_DIR ${CMAKE_CURRENT_BINARY_DIR}/raft CACHE STRING "Path to RAFT repo")

ExternalProject_Add(raft
GIT_REPOSITORY [email protected]:rapidsai/raft.git
GIT_TAG pinned_commit/git_tag/branch
PREFIX ${RAFT_GIT_DIR}
CONFIGURE_COMMAND ""
BUILD_COMMAND ""
INSTALL_COMMAND "")

set(RAFT_INCLUDE_DIR ${RAFT_GIT_DIR}/src/raft/cpp/include CACHE STRING "RAFT include variable")
endif(DEFINED ENV{RAFT_PATH})

```

This create the variable `$RAFT_INCLUDE_DIR` variable that can be used in `include_directories`, and then the related header files can be included when needed.

### Python/Cython Integration

RAFT's Python and Cython code have been designed to be included in projects that use RAFT, as opposed to be distributed by itself as a Python package. To use:

- The file `setuputils.py` is included in RAFT's `python` folder. Copy the file to your repo, in a location where it can be imported by `setup.py`
- In your setup.py, use the function `use_raft_package`, for example for cuML:


```python
# Optional location of C++ build folder that can be configured by the user
libcuml_path = get_environment_option('CUML_BUILD_PATH')
# Optional location of RAFT that can be confugred by the user
raft_path = get_environment_option('RAFT_PATH')

use_raft_package(raft_path, libcuml_path)
```

The usage of RAFT by the consuming repo's python code follows the rules:
1. If the environment variable `RAFT_PATH` points to the RAFT repo, then that will be used.
2. If there is a C++ build folder that has cloned RAFT already, setup.py will use that RAFT.
3. If none of the above happened, then setup.py will clone RAFT and use it directly.

- After `setup.py` calls the `use_raft_package` function, the RAFT python code will be included (via a symlink) in the consuming repo package, under a raft subfolder. So for example, `cuml` python package includes RAFT in `cuml.raft`.


## Building and running tests

Since RAFT is not meant to create any artifact on itself, but be included in other projects, the build infrastructure is focused only on testing.

The base folder in the repository contains a `build.sh` script that builds both the C++ and Python code, which is the recommended way of building the tests.
C++ RAFT is a header-only library but provides the option of building shared libraries with template instantiations for common types to speed up compile times for larger projects. The recommended way to build and install RAFT is to use the `build.sh` script in the root of the repository. This script can build both the C++ and Python code and provides options for building and installing the shared libraries.

To run C++ tests:

Expand All @@ -87,64 +26,77 @@ cd python
python -m pytest raft
```

To build manually, you can also use `CMake` and setup.py directly. For C++:
To build manually, you can also use `CMake` and setup.py directly.

For C++, the `RAFT_COMPILE_LIBRARIES` option can be used to compile the shared libraries. Shared libraries are provided for the `nn` and `distance` packages currently. The `nn` package requires FAISS, which will be built from source if it is not already installed. FAISS can optionally be statically compiled into the `nn` shared library with the `RAFT_USE_FAISS_STATIC` option.

To install RAFT into a specific location, use `CMAKE_INSTALL_PREFIX`. The snippet below will install it into the current conda environment.
```bash
cd cpp
mkdir build
cd build
cmake ..
cmake -DRAFT_COMPILE_LIBRARIES=ON -DRAFT_USE_FAISS_STATIC=OFF -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX ../
make install
```

There is no `install` target currently.

For python:

```bash
cd python
python setup.py build_ext --inplace
python setup.py install
```

## Using RAFT in downstream projects

## CI Process

PRs submitted to RAFT will always run the RAFT tests (once GPUCI is enabled). Additionally, RAFT has convenience functionality to run tests of the following projects that use RAFT: cuML and cuGraph.

To run these other tests, turn `ON` the variables in `ci/prtest.config` in your PR:

```bash
RUN_CUGRAPH_LIBCUGRAPH_TESTS=OFF
RUN_CUGRAPH_PYTHON_TESTS=OFF

RUN_CUML_LIBCUML_TESTS=OFF
RUN_CUML_PRIMS_TESTS=OFF
RUN_CUML_PYTHON_TESTS=OFF
```

This will make it so that CI in the PR will clone and build the respective repository, but the repository **will be built using the fork/branch of RAFT in the PR**. This allows to test changes in RAFT without the need of opening PRs in the other repositories.

Before merging the PR, those variables need to be returned to `OFF`.


## Developer Guide

### Local Development
### C++ Integration

To help working with RAFT and consuming projects as seamless as possible, this section describes how a typical workflow looks like and gives some guidelines for developers working in projects that affect code in both RAFT and at least one downstream repository.
Use RAFT in cmake projects with `find_package(raft)` for header-only operation and the `raft::raft` target will be available for configuring linking and `RAFT_INCLUDE_DIR` will be available for includes. Note that if any packages are used which require downstream dependencies, such as the `nn` package requiring FAISS, these dependencies will have be installed and configured in cmake independently.

Using as an example developer working on cuML and RAFT, we recommend the following:
Use `find_package(raft COMPONENTS nn, distance)` to enable the shared libraries and pass dependencies through separate targets for each component. In this example, `raft::distance` and `raft::nn` targets will be available for configuring linking paths. These targets will also pass through any transitive dependencies (such as FAISS in the case of the `nn` package).

- Create two working folders: one containing the cloned cuML repository and the other the cloned RAFT one.
- Create environment variable `RAFT_PATH` pointing to the location of the RAFT path.
- Work on same named branches in both repos/folders.
### Building RAFT C++ from source

This will facilitate development, and the `RAFT_PATH` variable will make it so that the downstream repository, in this case cuML, builds using the locally cloned RAFT (as descrbed in the first step).
RAFT uses the [RAPIDS cmake](https://github.com/rapidsai/rapids-cmake) library, so it can be easily included into downstream projects. RAPIDS cmake provides a convenience layer around the [Cmake Package Manager (CPM)](https://github.com/cpm-cmake/CPM.cmake). The following example is similar to building RAFT itself from source but allows it to be done in cmake, providing the `raft::raft` target for includes by default. The `COMPILE_LIBRARIES` option enables the building of the shared libraries

### Submitting PRs Guidelines
```cmake
function(find_and_configure_raft)

set(oneValueArgs VERSION FORK PINNED_TAG USE_RAFT_NN USE_FAISS_STATIC COMPILE_LIBRARIES)
cmake_parse_arguments(PKG "${options}" "${oneValueArgs}"
"${multiValueArgs}" ${ARGN} )

rapids_cpm_find(raft ${PKG_VERSION}
GLOBAL_TARGETS raft::raft
BUILD_EXPORT_SET proj-exports
INSTALL_EXPORT_SET proj-exports
CPM_ARGS
GIT_REPOSITORY https://github.com/${PKG_FORK}/raft.git
GIT_TAG ${PKG_PINNED_TAG}
SOURCE_SUBDIR cpp
FIND_PACKAGE_ARGUMENTS "COMPONENTS ${RAFT_COMPONENTS}"
OPTIONS
"BUILD_TESTS OFF"
"RAFT_USE_FAISS_STATIC ${PKG_USE_FAISS_STATIC}"
"NVTX ${NVTX}"
"RAFT_COMPILE_LIBRARIES ${COMPILE_LIBRARIES}"

)

endfunction()

# Change pinned tag here to test a commit in CI
# To use a different RAFT locally, set the CMake variable
# CPM_raft_SOURCE=/path/to/local/raft
find_and_configure_raft(VERSION 22.02.00
FORK rapidsai
PINNED_TAG branch-22.02
USE_RAFT_NN NO
USE_FAISS_STATIC NO
COMPILE_LIBRARIES NO
)
```

If you have changes to both RAFT and at least one downstream repo, then:
### Python/Cython Integration

- It is recommended to open a PR to both repositories (for visibility and CI tests).
- Change the pinned branch/commit in the downstream repo PR to point to the fork and branch used for the RAFT PR to make CI run tests
- If your changes might affect usage of RAFT by other downnstream repos, alert reviewers and open a github issue or PR in that downstream repo as approproate.
- The PR to RAFT will be merged first, so that the downstream repo PR pinned branch/commit can be returned to the main RAFT branch and run CI with it.
Once installed, RAFT's Python library can be imported and used directly.
24 changes: 24 additions & 0 deletions DEVELOPER_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Developer Guide

## Local Development

Devloping features and fixing bugs for the RAFT library itself is straightforward and only requires building and installing the relevant RAFT artifacts.

The process for working on a CUDA/C++ feature which spans RAFT and one or more consumers can vary slightly depending on whether the consuming project relies on a source build (as outlined in the [BUILD](BUILD.md#building-raft-c-from-source) docs). In such a case, the option `CPM_raft_SOURCE=/path/to/raft/source` can be passed to the cmake of the consuming project in order to build the local RAFT from source. The PR with relevant changes to the consuming project can also pin the RAFT version temporarily by explicitly changing the `FORK` and `PINNED_TAG` arguments to the RAFT branch containing their changes when invoking `find_and_configure_raft`. The pin should be reverted after the changed is merged to the RAFT project and before it is merged to the dependent project(s) downstream.

If building a feature which spans projects and not using the source build in cmake, the RAFT changes (both C++ and Python) will need to be installed into the environment of the consuming project before they can be used. The ideal integration of RAFT into consuming projects will enable both the source build in the consuming project only for this case but also rely on a more stable packaging (such as conda packaging) otherwise.

## API stability

Since RAFT is a core library with multiple consumers, it's important that the public APIs maintain stability across versions and any changes to them are done with caution, adding new functions and deprecating the old functions over a couple releases as necessary.

The public APIs should be lightweight wrappers around calls to private APIs inside the `detail` namespace.

## Testing

It's important for RAFT to maintain a high test coverage in order to minimize the potential for downstream projects to encounter unexpected build or runtime behavior as a result of changes. A well-defined public API can help maintain compile-time stability but means more focus should be placed on testing the functional requirements and verifying execution on the various edge cases within RAFT itself. Ideally, bug fixes and new features should be able to be made to RAFT independently of the consuming projects.


## Documentation

Public APIs always require documentation, since those will be exposed directly to users. In addition to summarizing the purpose of each class / function in the public API, the arguments (and relevant templates) should be documented along with brief usage examples.
49 changes: 14 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# <div align="left"><img src="https://rapids.ai/assets/images/rapids_logo.png" width="90px"/>&nbsp;RAFT: RAPIDS Analytics Framework Toolkit</div>

RAFT is a library containing building-blocks for rapid composition of RAPIDS Analytics. These building-blocks include shared representations, mathematical computational primitives, and utilities that accelerate building analytics and data science algorithms in the RAPIDS ecosystem. Both the C++ and Python components can be included in consuming libraries, providing building-blocks for both dense and sparse matrix formats in the following general categories:
RAFT is a library containing building-blocks for rapid composition of RAPIDS Analytics. These building-blocks include shared representations, mathematical computational primitives, and utilities that accelerate building analytics and data science algorithms in the RAPIDS ecosystem. Both the C++ and Python components can be included in consuming libraries, providing operations for both dense and sparse matrix formats in the following general categories:

#####
| Category | Description / Examples |
| --- | --- |
Expand All @@ -17,11 +18,18 @@ the maintenance burden by maximizing reuse across projects. RAFT relies on the [
like other projects in the RAPIDS ecosystem, eases the burden of configuring different allocation strategies globally
across the libraries that use it. RMM also provides RAII wrappers around device arrays that handle the allocation and cleanup.

## RAFT's primary goals are to be...
1. Fast- First and foremost, they provide a significant performance boost out of the box
2. Simple- Easy to use and easy to integrate into downstream projects
3. Reusable- Standardized core components minimize the need for "reinventing the wheel"
4. Composable- APIs that work well together and with other APIs
5. Comprehensive- Enable building a wide spectrum of different analytics

## Getting started

Refer to the [Build and Development Guide](BUILD.md) for details on RAFT's design, building, testing and development guidelines.
Refer to the [Build](BUILD.md) instructions for details on building and including the RAFT library in downstream projects. The [Developer Guide](DEVELOPER_GUIDE.md) contains details on the developer guidelines, workflows, and principals. If you are interested in contributing to the RAFT project, please read our [Contributing guidelines](CONTRIBUTING.md).

Most of the primitives in RAFT accept a `raft::handle_t` object for the management of resources which are expensive to create, such CUDA streams, stream pools, and handles to other CUDA libraries like `cublas` and `cusolver`.
Most of the primitives in RAFT accept a `raft::handle_t` object for the management of resources which are expensive to create, such CUDA streams, stream pools, and handles to other CUDA libraries like `cublas` and `cusolver`.


### C++ Example
Expand Down Expand Up @@ -58,37 +66,8 @@ raft::distance::pairwise_distance(handle, input.data(), input.data(),

The folder structure mirrors other RAPIDS repos (cuDF, cuML, cuGraph...), with the following folders:

- `ci`: Scripts for running CI in PRs
- `conda`: conda recipes and development conda environments
- `cpp`: Source code for all C++ code. The code is currently header-only, therefore it is in the `include` folder (with no `src`).
- `docs`: Source code and scripts for building library documentation
- `python`: Source code for all Python source code.
- `ci`: Scripts for running CI in PRs

[comment]: <> (TODO: This needs to be updated after the public API is established)
[comment]: <> (The library layout contains the following structure:)

[comment]: <> (```bash)

[comment]: <> (cpp/include/raft)

[comment]: <> ( |------------ comms [communication abstraction layer])

[comment]: <> ( |------------ distance [dense pairwise distances])

[comment]: <> ( |------------ linalg [dense linear algebra])

[comment]: <> ( |------------ matrix [dense matrix format])

[comment]: <> ( |------------ random [random matrix generation])

[comment]: <> ( |------------ sparse [sparse matrix and graph algorithms])

[comment]: <> ( |------------ spatial [spatial algorithms])

[comment]: <> ( |------------ spectral [spectral clustering])

[comment]: <> ( |------------ stats [statistics primitives])

[comment]: <> ( |------------ handle.hpp [raft handle])

[comment]: <> (```)


26 changes: 15 additions & 11 deletions build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,14 @@ ARGS=$*
# script, and that this script resides in the repo dir!
REPODIR=$(cd $(dirname $0); pwd)

VALIDARGS="clean cppraft pyraft cppdocs -v -g --allgpuarch --nvtx --show_depr_warn -h --buildgtest --buildfaiss"
VALIDARGS="clean cppraft pyraft docs -v -g --allgpuarch --nvtx --show_depr_warn -h --buildgtest --buildfaiss"
HELP="$0 [<target> ...] [<flag> ...]
where <target> is:
clean - remove all existing build artifacts and configuration (start over)
cppraft - build the cuml C++ code only. Also builds the C-wrapper library
around the C++ code.
pyraft - build the cuml Python package
cppdocs - build the C++ doxygen documentation
docs - build the documentation
and <flag> is:
-v - verbose build mode
-g - build for debug
Expand All @@ -38,6 +38,7 @@ HELP="$0 [<target> ...] [<flag> ...]
default action (no args) is to build both cppraft and pyraft targets
"
CPP_RAFT_BUILD_DIR=${REPODIR}/cpp/build
SPHINX_BUILD_DIR=${REPODIR}/docs
PY_RAFT_BUILD_DIR=${REPODIR}/python/build
PYTHON_DEPS_CLONE=${REPODIR}/python/external_repositories
BUILD_DIRS="${CPP_RAFT_BUILD_DIR} ${PY_RAFT_BUILD_DIR} ${PYTHON_DEPS_CLONE}"
Expand Down Expand Up @@ -131,14 +132,10 @@ if (( ${CLEAN} == 1 )); then
cd ${REPODIR}
fi

if hasArg cppdocs; then
cd ${CPP_RAFT_BUILD_DIR}
cmake --build ${CPP_RAFT_BUILD_DIR} --target docs_raft
fi

################################################################################
# Configure for building all C++ targets
if (( ${NUMARGS} == 0 )) || hasArg cppraft; then
if (( ${NUMARGS} == 0 )) || hasArg cppraft || hasArg docs; then
if (( ${BUILD_ALL_GPU_ARCH} == 0 )); then
RAFT_CMAKE_CUDA_ARCHITECTURES="NATIVE"
echo "Building for the architecture of the GPU in the system..."
Expand All @@ -155,14 +152,15 @@ if (( ${NUMARGS} == 0 )) || hasArg cppraft; then
-DBUILD_GTEST=${BUILD_GTEST} \
-DBUILD_STATIC_FAISS=${BUILD_STATIC_FAISS}


# Run all c++ targets at once
cmake --build ${CPP_RAFT_BUILD_DIR} -j${PARALLEL_LEVEL} ${MAKE_TARGETS} ${VERBOSE_FLAG}
if hasArg cppraft; then
# Run all c++ targets at once
cmake --build ${CPP_RAFT_BUILD_DIR} -j${PARALLEL_LEVEL} ${MAKE_TARGETS} ${VERBOSE_FLAG}
fi
fi


# Build and (optionally) install the cuml Python package
if (( ${NUMARGS} == 0 )) || hasArg pyraft; then
if (( ${NUMARGS} == 0 )) || hasArg pyraft || hasArg docs; then

cd ${REPODIR}/python
if [[ ${INSTALL_TARGET} != "" ]]; then
Expand All @@ -171,3 +169,9 @@ if (( ${NUMARGS} == 0 )) || hasArg pyraft; then
python setup.py build_ext -j${PARALLEL_LEVEL:-1} --inplace --library-dir=${LIBCUML_BUILD_DIR} ${SINGLEGPU}
fi
fi

if hasArg docs; then
cmake --build ${CPP_RAFT_BUILD_DIR} --target docs_raft
cd ${SPHINX_BUILD_DIR}
make html
fi
Loading