Skip to content

Commit

Permalink
Migrate to use cuVS for vector search (#6085)
Browse files Browse the repository at this point in the history
This PR updates to use cuVS instead of RAFT for vector search, pairwise distances and clustering. This is required for us to deprecate the vector search functionality in RAFT, in favour of the code in cuVS.

Because some code hasn't been migrated over to cuvs yet, we will continue to use the version in RAFT - but with RAFT in header only mode. In particular this functionality will be used in RAFT header only mode:

* Random Ball Cover (see rapidsai/cuvs#218)
* Sparse KNN
* nn-descent rapidsai/cuvs#364
* [MetricProcessor](c7d1b0e)
* knn_merge_parts
* build_dendrogram_host
* build_sorted_mst
*  raft DistanceType

Because sparse KNN in RAFT uses the DistanceType in RAFT, we can't fully move over to use the DistanceType code in cuVS with this PR. (Also the DistanceType code in RAFT has a `Precomputed` option that isn't available in cuvs - but is needed by cuml for dbscan.)  This means that we have both the raft and cuvs DistanceType enum's in use with this change, with conversions between them.

Authors:
  - Ben Frederickson (https://github.com/benfred)
  - Bradley Dice (https://github.com/bdice)
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Bradley Dice (https://github.com/bdice)

URL: #6085
  • Loading branch information
benfred authored Oct 4, 2024
1 parent 65a02f6 commit 70fe526
Show file tree
Hide file tree
Showing 71 changed files with 694 additions and 2,570 deletions.
7 changes: 5 additions & 2 deletions ci/build_wheel.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ cd ${package_dir}
case "${RAPIDS_CUDA_VERSION}" in
12.*)
EXCLUDE_ARGS=(
--exclude "libcuvs.so"
--exclude "libcublas.so.12"
--exclude "libcublasLt.so.12"
--exclude "libcufft.so.11"
Expand All @@ -32,12 +33,14 @@ case "${RAPIDS_CUDA_VERSION}" in
EXTRA_CMAKE_ARGS=";-DUSE_CUDA_MATH_WHEELS=ON"
;;
11.*)
EXCLUDE_ARGS=()
EXCLUDE_ARGS=(
--exclude "libcuvs.so"
)
EXTRA_CMAKE_ARGS=";-DUSE_CUDA_MATH_WHEELS=OFF"
;;
esac

SKBUILD_CMAKE_ARGS="-DDETECT_CONDA_ENV=OFF;-DDISABLE_DEPRECATION_WARNINGS=ON;-DCPM_cumlprims_mg_SOURCE=${GITHUB_WORKSPACE}/cumlprims_mg/${EXTRA_CMAKE_ARGS}" \
SKBUILD_CMAKE_ARGS="-DDETECT_CONDA_ENV=OFF;-DDISABLE_DEPRECATION_WARNINGS=ON;-DCPM_cumlprims_mg_SOURCE=${GITHUB_WORKSPACE}/cumlprims_mg/;-DUSE_CUVS_WHEEL=ON${EXTRA_CMAKE_ARGS}" \
python -m pip wheel . \
-w dist \
-vvv \
Expand Down
2 changes: 2 additions & 0 deletions ci/release/update-version.sh
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,13 @@ echo "${NEXT_FULL_TAG}" > VERSION
DEPENDENCIES=(
cudf
cuml
cuvs
dask-cuda
dask-cudf
libcuml
libcuml-tests
libcumlprims
libcuvs
libraft-headers
libraft
librmm
Expand Down
3 changes: 2 additions & 1 deletion conda/environments/all_cuda-118_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ dependencies:
- cudatoolkit
- cudf==24.10.*,>=0.0.0a0
- cupy>=12.0.0
- cuvs==24.10.*,>=0.0.0a0
- cxx-compiler
- cython>=3.0.0
- dask-cuda==24.10.*,>=0.0.0a0
Expand All @@ -39,8 +40,8 @@ dependencies:
- libcusolver=11.4.1.48
- libcusparse-dev=11.7.5.86
- libcusparse=11.7.5.86
- libcuvs==24.10.*,>=0.0.0a0
- libraft-headers==24.10.*,>=0.0.0a0
- libraft==24.10.*,>=0.0.0a0
- librmm==24.10.*,>=0.0.0a0
- nbsphinx
- ninja
Expand Down
3 changes: 2 additions & 1 deletion conda/environments/all_cuda-125_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ dependencies:
- cuda-version=12.5
- cudf==24.10.*,>=0.0.0a0
- cupy>=12.0.0
- cuvs==24.10.*,>=0.0.0a0
- cxx-compiler
- cython>=3.0.0
- dask-cuda==24.10.*,>=0.0.0a0
Expand All @@ -36,8 +37,8 @@ dependencies:
- libcurand-dev
- libcusolver-dev
- libcusparse-dev
- libcuvs==24.10.*,>=0.0.0a0
- libraft-headers==24.10.*,>=0.0.0a0
- libraft==24.10.*,>=0.0.0a0
- librmm==24.10.*,>=0.0.0a0
- nbsphinx
- ninja
Expand Down
2 changes: 1 addition & 1 deletion conda/environments/clang_tidy_cuda-118_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ dependencies:
- libcusolver=11.4.1.48
- libcusparse-dev=11.7.5.86
- libcusparse=11.7.5.86
- libcuvs==24.10.*,>=0.0.0a0
- libraft-headers==24.10.*,>=0.0.0a0
- libraft==24.10.*,>=0.0.0a0
- librmm==24.10.*,>=0.0.0a0
- ninja
- nvcc_linux-64=11.8
Expand Down
2 changes: 1 addition & 1 deletion conda/environments/cpp_all_cuda-118_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ dependencies:
- libcusolver=11.4.1.48
- libcusparse-dev=11.7.5.86
- libcusparse=11.7.5.86
- libcuvs==24.10.*,>=0.0.0a0
- libraft-headers==24.10.*,>=0.0.0a0
- libraft==24.10.*,>=0.0.0a0
- librmm==24.10.*,>=0.0.0a0
- ninja
- nvcc_linux-64=11.8
Expand Down
2 changes: 1 addition & 1 deletion conda/environments/cpp_all_cuda-125_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ dependencies:
- libcurand-dev
- libcusolver-dev
- libcusparse-dev
- libcuvs==24.10.*,>=0.0.0a0
- libraft-headers==24.10.*,>=0.0.0a0
- libraft==24.10.*,>=0.0.0a0
- librmm==24.10.*,>=0.0.0a0
- ninja
- spdlog>=1.14.1,<1.15
Expand Down
4 changes: 2 additions & 2 deletions conda/recipes/libcuml/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ requirements:
{% endif %}
- fmt {{ fmt_version }}
- libcumlprims ={{ minor_version }}
- libraft ={{ minor_version }}
- libcuvs ={{ minor_version }}
- libraft-headers ={{ minor_version }}
- librmm ={{ minor_version }}
- spdlog {{ spdlog_version }}
Expand Down Expand Up @@ -116,7 +116,7 @@ outputs:
- libcusparse
{% endif %}
- libcumlprims ={{ minor_version }}
- libraft ={{ minor_version }}
- libcuvs ={{ minor_version }}
- librmm ={{ minor_version }}
- treelite {{ treelite_version }}
about:
Expand Down
21 changes: 5 additions & 16 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,7 @@ option(SINGLEGPU "Disable all mnmg components and comms libraries" OFF)
option(USE_CCACHE "Cache build artifacts with ccache" OFF)
option(CUDA_STATIC_RUNTIME "Statically link the CUDA runtime" OFF)
option(CUDA_STATIC_MATH_LIBRARIES "Statically link the CUDA math libraries" OFF)
option(CUML_USE_RAFT_STATIC "Build and statically link the RAFT libraries" OFF)
option(CUML_RAFT_COMPILED "Use libraft shared library" ON)
option(CUML_USE_CUVS_STATIC "Build and statically link the CUVS library" OFF)
option(CUML_USE_TREELITE_STATIC "Build and statically link the treelite library" OFF)
option(CUML_EXPORT_TREELITE_LINKAGE "Whether to publicly or privately link treelite to libcuml++" OFF)
option(CUML_USE_CUMLPRIMS_MG_STATIC "Build and statically link the cumlprims_mg library" OFF)
Expand All @@ -78,6 +77,7 @@ option(CUML_EXCLUDE_RAFT_FROM_ALL "Exclude RAFT targets from cuML's 'all' target
option(CUML_EXCLUDE_TREELITE_FROM_ALL "Exclude Treelite targets from cuML's 'all' target" OFF)
option(CUML_EXCLUDE_CUMLPRIMS_MG_FROM_ALL "Exclude cumlprims_mg targets from cuML's 'all' target" OFF)
option(CUML_RAFT_CLONE_ON_PIN "Explicitly clone RAFT branch when pinned to non-feature branch" ON)
option(CUML_CUVS_CLONE_ON_PIN "Explicitly clone CUVS branch when pinned to non-feature branch" ON)

message(VERBOSE "CUML_CPP: Building libcuml_c shared library. Contains the cuML C API: ${BUILD_CUML_C_LIBRARY}")
message(VERBOSE "CUML_CPP: Building libcuml shared library: ${BUILD_CUML_CPP_LIBRARY}")
Expand All @@ -98,7 +98,7 @@ message(VERBOSE "CUML_CPP: Disabling all mnmg components and comms libraries: ${
message(VERBOSE "CUML_CPP: Cache build artifacts with ccache: ${USE_CCACHE}")
message(VERBOSE "CUML_CPP: Statically link the CUDA runtime: ${CUDA_STATIC_RUNTIME}")
message(VERBOSE "CUML_CPP: Statically link the CUDA math libraries: ${CUDA_STATIC_MATH_LIBRARIES}")
message(VERBOSE "CUML_CPP: Build and statically link RAFT libraries: ${CUML_USE_RAFT_STATIC}")
message(VERBOSE "CUML_CPP: Build and statically link CUVS libraries: ${CUML_USE_CUVS_STATIC}")
message(VERBOSE "CUML_CPP: Build and statically link Treelite library: ${CUML_USE_TREELITE_STATIC}")

set(CUML_ALGORITHMS "ALL" CACHE STRING "Experimental: Choose which algorithms are built into libcuml++.so. Can specify individual algorithms or groups in a semicolon-separated list.")
Expand Down Expand Up @@ -228,6 +228,7 @@ endif()
include(cmake/thirdparty/get_cccl.cmake)
include(cmake/thirdparty/get_rmm.cmake)
include(cmake/thirdparty/get_raft.cmake)
include(cmake/thirdparty/get_cuvs.cmake)

if(LINK_TREELITE)
include(cmake/thirdparty/get_treelite.cmake)
Expand Down Expand Up @@ -442,18 +443,6 @@ if(BUILD_CUML_CPP_LIBRARY)
src/metrics/kl_divergence.cu
src/metrics/mutual_info_score.cu
src/metrics/pairwise_distance.cu
src/metrics/pairwise_distance_canberra.cu
src/metrics/pairwise_distance_chebyshev.cu
src/metrics/pairwise_distance_correlation.cu
src/metrics/pairwise_distance_cosine.cu
src/metrics/pairwise_distance_euclidean.cu
src/metrics/pairwise_distance_hamming.cu
src/metrics/pairwise_distance_hellinger.cu
src/metrics/pairwise_distance_jensen_shannon.cu
src/metrics/pairwise_distance_kl_divergence.cu
src/metrics/pairwise_distance_l1.cu
src/metrics/pairwise_distance_minkowski.cu
src/metrics/pairwise_distance_russell_rao.cu
src/metrics/r2_score.cu
src/metrics/rand_index.cu
src/metrics/silhouette_score.cu
Expand Down Expand Up @@ -635,7 +624,7 @@ if(BUILD_CUML_CPP_LIBRARY)
)

target_link_libraries(${CUML_CPP_TARGET}
PUBLIC rmm::rmm
PUBLIC rmm::rmm ${CUVS_LIB}
${_cuml_cpp_public_libs}
PRIVATE ${_cuml_cpp_private_libs}
)
Expand Down
1 change: 0 additions & 1 deletion cpp/bench/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,6 @@ if(BUILD_CUML_BENCH)
benchmark::benchmark
${TREELITE_LIBS}
raft::raft
raft::compiled
)

target_include_directories(${CUML_CPP_BENCH_TARGET}
Expand Down
2 changes: 1 addition & 1 deletion cpp/bench/sg/kmeans.cu
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ std::vector<Params> getInputs()
p.kmeans.max_iter = 300;
p.kmeans.tol = 1e-4;
p.kmeans.verbosity = RAFT_LEVEL_INFO;
p.kmeans.metric = raft::distance::DistanceType::L2Expanded;
p.kmeans.metric = cuvs::distance::DistanceType::L2Expanded;
p.kmeans.rng_state = raft::random::RngState(p.blobs.seed);
p.kmeans.inertia_check = true;
std::vector<std::pair<int, int>> rowcols = {
Expand Down
77 changes: 77 additions & 0 deletions cpp/cmake/thirdparty/get_cuvs.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
#=============================================================================
# Copyright (c) 2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#=============================================================================

set(CUML_MIN_VERSION_cuvs "${CUML_VERSION_MAJOR}.${CUML_VERSION_MINOR}.00")
set(CUML_BRANCH_VERSION_cuvs "${CUML_VERSION_MAJOR}.${CUML_VERSION_MINOR}")

function(find_and_configure_cuvs)
set(oneValueArgs VERSION FORK PINNED_TAG EXCLUDE_FROM_ALL USE_CUVS_STATIC COMPILE_LIBRARY CLONE_ON_PIN)
cmake_parse_arguments(PKG "${options}" "${oneValueArgs}"
"${multiValueArgs}" ${ARGN} )

if(PKG_CLONE_ON_PIN AND NOT PKG_PINNED_TAG STREQUAL "branch-${CUML_BRANCH_VERSION_cuvs}")
message(STATUS "CUML: CUVS pinned tag found: ${PKG_PINNED_TAG}. Cloning cuvs locally.")
set(CPM_DOWNLOAD_cuvs ON)
elseif(PKG_USE_CUVS_STATIC AND (NOT CPM_cuvs_SOURCE))
message(STATUS "CUML: Cloning cuvs locally to build static libraries.")
set(CPM_DOWNLOAD_cuvs ON)
else()
message(STATUS "Not cloning cuvs locally")
endif()

if(PKG_USE_CUVS_STATIC)
set(CUVS_LIB cuvs::cuvs_static PARENT_SCOPE)
else()
set(CUVS_LIB cuvs::cuvs PARENT_SCOPE)
endif()

rapids_cpm_find(cuvs ${PKG_VERSION}
GLOBAL_TARGETS cuvs::cuvs
BUILD_EXPORT_SET cuml-exports
INSTALL_EXPORT_SET cuml-exports
CPM_ARGS
GIT_REPOSITORY https://github.com/${PKG_FORK}/cuvs.git
GIT_TAG ${PKG_PINNED_TAG}
SOURCE_SUBDIR cpp
EXCLUDE_FROM_ALL ${PKG_EXCLUDE_FROM_ALL}
OPTIONS
"BUILD_TESTS OFF"
"BUILD_BENCH OFF"
)

if(cuvs_ADDED)
message(VERBOSE "CUML: Using CUVS located in ${cuvs_SOURCE_DIR}")
else()
message(VERBOSE "CUML: Using CUVS located in ${cuvs_DIR}")
endif()


endfunction()

# Change pinned tag here to test a commit in CI
# To use a different CUVS locally, set the CMake variable
# CPM_cuvs_SOURCE=/path/to/local/cuvs
find_and_configure_cuvs(VERSION ${CUML_MIN_VERSION_cuvs}
FORK rapidsai
PINNED_TAG branch-${CUML_BRANCH_VERSION_cuvs}
EXCLUDE_FROM_ALL ${CUML_EXCLUDE_CUVS_FROM_ALL}
# When PINNED_TAG above doesn't match cuml,
# force local cuvs clone in build directory
# even if it's already installed.
CLONE_ON_PIN ${CUML_CUVS_CLONE_ON_PIN}
COMPILE_LIBRARY ${CUML_CUVS_COMPILED}
USE_CUVS_STATIC ${CUML_USE_CUVS_STATIC}
)
12 changes: 1 addition & 11 deletions cpp/cmake/thirdparty/get_raft.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -36,16 +36,6 @@ function(find_and_configure_raft)
string(APPEND RAFT_COMPONENTS " distributed")
endif()

if(PKG_COMPILE_LIBRARY)
if(NOT PKG_USE_RAFT_STATIC)
string(APPEND RAFT_COMPONENTS " compiled")
set(RAFT_COMPILED_LIB raft::compiled PARENT_SCOPE)
else()
string(APPEND RAFT_COMPONENTS " compiled_static")
set(RAFT_COMPILED_LIB raft::compiled_static PARENT_SCOPE)
endif()
endif()

# We need to set this each time so that on subsequent calls to cmake
# the raft-config.cmake re-evaluates the RAFT_NVTX value
set(RAFT_NVTX ${PKG_NVTX})
Expand All @@ -66,7 +56,7 @@ function(find_and_configure_raft)
"BUILD_TESTS OFF"
"BUILD_BENCH OFF"
"BUILD_CAGRA_HNSWLIB OFF"
"RAFT_COMPILE_LIBRARY ${PKG_COMPILE_LIBRARY}"
"RAFT_COMPILE_LIBRARY OFF"
)

if(raft_ADDED)
Expand Down
2 changes: 1 addition & 1 deletion cpp/examples/kmeans/kmeans_example.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ int main(int argc, char* argv[])
params.max_iter = 300;
params.tol = 0.05;
}
params.metric = raft::distance::DistanceType::L2SqrtExpanded;
params.metric = cuvs::distance::DistanceType::L2SqrtExpanded;
params.init = ML::kmeans::KMeansParams::InitMethod::Random;

// Inputs copied from kmeans_test.cu
Expand Down
4 changes: 2 additions & 2 deletions cpp/include/cuml/cluster/kmeans.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

#include <cuml/common/log_levels.hpp>

#include <raft/cluster/kmeans_types.hpp>
#include <cuvs/cluster/kmeans.hpp>

namespace raft {
class handle_t;
Expand All @@ -28,7 +28,7 @@ namespace ML {

namespace kmeans {

using KMeansParams = raft::cluster::KMeansParams;
using KMeansParams = cuvs::cluster::kmeans::params;

/**
* @brief Compute k-means clustering and predicts cluster index for each sample
Expand Down
10 changes: 5 additions & 5 deletions cpp/include/cuml/cluster/kmeans_mg.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2022, NVIDIA CORPORATION.
* Copyright (c) 2019-2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -48,7 +48,7 @@ namespace opg {
* @param[out] n_iter Number of iterations run.
*/

void fit(const raft::handle_t& handle,
void fit(const raft::resources& handle,
const KMeansParams& params,
const float* X,
int n_samples,
Expand All @@ -58,7 +58,7 @@ void fit(const raft::handle_t& handle,
float& inertia,
int& n_iter);

void fit(const raft::handle_t& handle,
void fit(const raft::resources& handle,
const KMeansParams& params,
const double* X,
int n_samples,
Expand All @@ -68,7 +68,7 @@ void fit(const raft::handle_t& handle,
double& inertia,
int& n_iter);

void fit(const raft::handle_t& handle,
void fit(const raft::resources& handle,
const KMeansParams& params,
const float* X,
int64_t n_samples,
Expand All @@ -78,7 +78,7 @@ void fit(const raft::handle_t& handle,
float& inertia,
int64_t& n_iter);

void fit(const raft::handle_t& handle,
void fit(const raft::resources& handle,
const KMeansParams& params,
const double* X,
int64_t n_samples,
Expand Down
Loading

0 comments on commit 70fe526

Please sign in to comment.