Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate KNN implementation: ivf-pq #789

Merged
merged 148 commits into from
Sep 30, 2022
Merged
Show file tree
Hide file tree
Changes from 86 commits
Commits
Show all changes
148 commits
Select commit Hold shift + click to select a range
4d7b699
Initial commit: copied everything into a single file
achirkin Jun 24, 2022
306d4bb
Merge branch 'branch-22.08' into fea-knn-ivf-pq
achirkin Aug 4, 2022
ebc20be
Add missing functions and fix small compile errors
achirkin Aug 4, 2022
4f71b09
Added tests (still failing)
achirkin Aug 5, 2022
1d6402f
WIP replacing chunks of code with raft's helpers
achirkin Aug 9, 2022
f12e5da
Reset the expected cuda error to cudaSuccess to not crumble following…
achirkin Aug 10, 2022
12fad92
Replace cuann return codes with raft exceptions
achirkin Aug 10, 2022
0bb702f
replace cublas calls with raft wrappers
achirkin Aug 10, 2022
e1cae88
Replace _cuann_memset with utils::memzero
achirkin Aug 10, 2022
5b2c799
Use raft logging in most of the places and wrap all cuda calls into R…
achirkin Aug 10, 2022
4f64cd3
Use helper accumulate_into_selected
achirkin Aug 10, 2022
0a7324b
Use helper copy_selected
achirkin Aug 10, 2022
4d11178
Wrap remaining cuda calls into RAFT_CUDA_TRY and replace asserts with…
achirkin Aug 10, 2022
267b35e
WIP: templatizing the api
achirkin Aug 11, 2022
2541250
Add more templates and use adjust_centers
achirkin Aug 11, 2022
67bffa4
Put index as a member of descriptor
achirkin Aug 11, 2022
f951c52
Add the index extending operation
achirkin Aug 12, 2022
5822844
Replace cuann similarity with raft distance metric
achirkin Aug 15, 2022
62d32a7
Use raft logging levels instead of CUANN_DEBUG macro
achirkin Aug 15, 2022
48c8702
Made the API scaffold.
achirkin Aug 15, 2022
a74ed5d
Cleanup unused functions
achirkin Aug 16, 2022
f75a5b5
Removed couple helpers and added a few WIP comments
achirkin Aug 17, 2022
f8ea503
Fixing small warnings to make Release config compilable
achirkin Aug 17, 2022
140f186
Don't store all k scores in the temporary buffer
achirkin Aug 18, 2022
cb70c79
Use raft topk code
achirkin Aug 18, 2022
52dbe0d
Replace _cuann_kmeans_predict_MP with raft's kmeans primitives
achirkin Aug 18, 2022
613d551
Remove _cuann_kmeans_predict and use the higher-level kmeans::build_c…
achirkin Aug 18, 2022
b939b8b
Cleanup unused functions and variables
achirkin Aug 18, 2022
9a0d371
Cleanup unused array
achirkin Aug 18, 2022
7bea6df
WIP swithing to RMM. Cleaned up _cuann_compute_PQ_code
achirkin Aug 19, 2022
d7e7d49
WIP swithing to RMM
achirkin Aug 19, 2022
faaa495
Replaced all function-local allocations with rmm::uvector/std::vector
achirkin Aug 22, 2022
e83eae0
Replacing search workspace with rmm (wip almost done)
achirkin Aug 22, 2022
f8bec3e
Switched to device pointers in the api
achirkin Aug 23, 2022
5dc2c4b
get rid of cuannIvfPqIndexHeader
achirkin Aug 23, 2022
c42e57c
Replaced more runtime branches with template instantiations
achirkin Aug 23, 2022
bd22b86
Reset cuda error at the end of the tests for better reporting
achirkin Aug 23, 2022
9907bb7
Reuse larger parts of kmeans clustering
achirkin Aug 23, 2022
3238591
Fix a legacy workaround to correctly calculate worst-case max number …
achirkin Aug 23, 2022
1bae646
Use the new functions from #788
achirkin Aug 24, 2022
066f8c6
Revert kmeans changes
achirkin Aug 24, 2022
e98a214
Reuse the last bits of kmeans
achirkin Aug 24, 2022
e4da7f8
Rename a few variables
achirkin Aug 25, 2022
d478645
Initialize topk output indices with zeros do workaround out-of-bounds…
achirkin Aug 25, 2022
d4e645a
Remove element type template T from the index definition
achirkin Aug 25, 2022
adf1dca
WIP: merging cuannIvfPqDescriptor into ivf_pq::index<IdxT>
achirkin Aug 25, 2022
fdb5ad0
Merge branch 'branch-22.10' into fea-knn-ivf-pq
achirkin Aug 26, 2022
e7cb541
WIP: merging cuannIvfPqDescriptor into ivf_pq::index<IdxT> - removing…
achirkin Aug 26, 2022
e3b61bd
WIP: merging cuannIvfPqDescriptor into ivf_pq::index<IdxT> - more ren…
achirkin Aug 26, 2022
cc00fb7
WIP: merging cuannIvfPqDescriptor into ivf_pq::index<IdxT> - moved al…
achirkin Aug 26, 2022
55a29c1
WIP: merging cuannIvfPqDescriptor into ivf_pq::index<IdxT> - continue…
achirkin Aug 26, 2022
961e647
Removed cuannIvfPqDescriptor usage from ivfpq_search
achirkin Aug 26, 2022
b889394
Completely removed cuannIvfPqDescriptor
achirkin Aug 26, 2022
81abc5d
Make ivf_pq::index properly movable
achirkin Aug 29, 2022
adbffd8
Remove last traces of omp for now
achirkin Aug 29, 2022
ef4e28b
Remove ivf_pq_legacy.cuh
achirkin Aug 29, 2022
adc7cbf
fix style
achirkin Aug 29, 2022
1a2f79c
Small fixes to docs]
achirkin Aug 29, 2022
a957ec9
use cudaMemcpy2DAsync in-place of transpose-3d-helper
achirkin Aug 29, 2022
f41ef2c
Add more nvtx annotations and renamings
achirkin Aug 29, 2022
9d292ae
Annotate a performance bottleneck TODO
achirkin Aug 29, 2022
8034aae
Collapsing wrapper code in ivf_pq::search
achirkin Aug 29, 2022
413842c
Use ceildiv everywhere
achirkin Aug 29, 2022
f1b2b39
Use new_indices in extend
achirkin Aug 30, 2022
c22d02c
Propagated IdxT into the implementation (largely untested)
achirkin Aug 30, 2022
d5d425b
Add support for InnerProduct distance type
achirkin Sep 1, 2022
20e74f2
Lift the inner-product-float-only restriction
achirkin Sep 1, 2022
d043c75
Fix output index type
achirkin Sep 2, 2022
f4a2852
Fix dependency on .cuh utils in a .hpp file
achirkin Sep 2, 2022
bf90837
A few small renamings in preparation for merging ivfpq_compute_simila…
achirkin Sep 2, 2022
369d49d
Merge branch 'branch-22.10' into fea-knn-ivf-pq
achirkin Sep 2, 2022
e55d043
Merge ivfpq_compute_similarity variants into a single kernel to reduc…
achirkin Sep 5, 2022
2a20b25
Merge remote-tracking branch 'rapidsai/branch-22.10' into fea-knn-ivf-pq
achirkin Sep 6, 2022
0e5cea5
fix: don't call copy on empty data
achirkin Sep 6, 2022
9b6668f
Use raft's device-wide topk functions
achirkin Sep 6, 2022
998c9a5
clang-tidy refactoring
achirkin Sep 6, 2022
66453a1
Fix the rest of clang-tidy warning and document bits
achirkin Sep 7, 2022
68a3664
Expand comments a little bit
achirkin Sep 7, 2022
bcd4100
rename lut_dtype parameter
achirkin Sep 7, 2022
8c88c79
Rework tests
achirkin Sep 8, 2022
60d2712
Allow reference implementation to have k > 256
achirkin Sep 8, 2022
5a3c3ae
Allow ivf-pq implementation to have k > 256
achirkin Sep 8, 2022
fe29a15
Fix managed memory being destroyed too soon on move assignment
achirkin Sep 9, 2022
969d31f
Separate the training logic (build) from adding new data to index (ex…
achirkin Sep 9, 2022
e9bcd52
.clang-tidy fixes and other small refactorings
achirkin Sep 12, 2022
0a93be9
Ported the ivfpq_encode_core bugfix from upstream
achirkin Sep 12, 2022
add19ef
Remove unnecessary type cast
achirkin Sep 13, 2022
b2c3799
Fix forgot '#pragma once'
achirkin Sep 13, 2022
97444d8
Fix incorrect constexpr bounds for fp16
achirkin Sep 13, 2022
e01ccdf
Fixed the last failing tests (negative scores with unsigned float type)
achirkin Sep 13, 2022
8d15e75
Simplified ivfpq_encode
achirkin Sep 14, 2022
38c244c
Merge remote-tracking branch 'rapidsai/branch-22.10' into fea-knn-ivf-pq
achirkin Sep 19, 2022
db23eb8
Rename reorganized includes
achirkin Sep 19, 2022
c05953f
Move a preprocessing code to GPU (up to 10x speedup in PER_SUBSPACE m…
achirkin Sep 19, 2022
c8e1431
Fix a bug causing recall drops and add support for signed fp8 interna…
achirkin Sep 20, 2022
528076a
Restore the old kernel setup code for now, as the last commit introdu…
achirkin Sep 21, 2022
22b3631
Remove a template parameter PqT that can be computed using other para…
achirkin Sep 21, 2022
af5da32
Small fixes to headers
achirkin Sep 22, 2022
68dab8d
Refactor kernel selecting pragmas into templates
achirkin Sep 22, 2022
5ef0067
Add specializations to ivfpq_compute_similarity kernel
achirkin Sep 22, 2022
90bac7c
Add a missing inline specifier
achirkin Sep 22, 2022
f2fbd65
Add more specializations to ivf_pq::search and split test instantiations
achirkin Sep 22, 2022
a77d453
Add benchmarks and split them by types to enable faster parallel comp…
achirkin Sep 22, 2022
1f6d353
Merge remote-tracking branch 'rapidsai/branch-22.10' into fea-knn-ivf-pq
achirkin Sep 22, 2022
213d66c
remove instantiations from the header
achirkin Sep 22, 2022
10bb02a
Adapt to changes in mdarray headers
achirkin Sep 22, 2022
5529872
Move the make_rotation_matrix to GPU
achirkin Sep 23, 2022
2f8f77c
Move the specialization includes between headers
achirkin Sep 23, 2022
443916f
Merge two template parameters to reduce the raft compile times by ~4-…
achirkin Sep 23, 2022
0acfdd7
Moving more bits to GPU and removing managed_memory
achirkin Sep 23, 2022
a5a0577
Merge remote-tracking branch 'rapidsai/branch-22.10' into fea-knn-ivf-pq
achirkin Sep 26, 2022
ee4aeee
Factor out calculate_offsets_and_indices
achirkin Sep 26, 2022
ad84f01
Sort the cluster by their size in the index; this allows to quickly e…
achirkin Sep 26, 2022
f5049a8
Proceeding further with removing the managed arrays
achirkin Sep 26, 2022
6a4d444
Remove managed memory from the index
achirkin Sep 27, 2022
a02767b
Slightly improve performance by reducing the number of arguments in d…
achirkin Sep 27, 2022
ad10141
Allow to disable random rotation with force_random_rotation index par…
achirkin Sep 27, 2022
90c6b9f
Use a lambda in place of the copy_fill helper.
achirkin Sep 28, 2022
54bbaac
Update cpp/include/raft/spatial/knn/detail/ivf_pq_build.cuh
achirkin Sep 28, 2022
4ecc6a8
Update cpp/include/raft/spatial/knn/detail/ivf_pq_build.cuh
achirkin Sep 28, 2022
9c5c463
Update cpp/include/raft/util/cudart_utils.hpp
achirkin Sep 28, 2022
4cdf0ce
Update cpp/include/raft/spatial/knn/ivf_pq_types.hpp
achirkin Sep 28, 2022
653331d
Update cpp/include/raft/spatial/knn/detail/ivf_pq_search.cuh
achirkin Sep 28, 2022
41a8730
clang-format
achirkin Sep 28, 2022
02fde4b
Add a missing knn.cuh header for the brute-force knn
achirkin Sep 28, 2022
4704684
Remove unused cpu code and an extra level of indirection
achirkin Sep 28, 2022
887cd7d
Update cpp/include/raft/spatial/knn/ivf_pq.cuh
achirkin Sep 28, 2022
5fe21da
Various small fixes
achirkin Sep 28, 2022
2df832f
More documentation
achirkin Sep 28, 2022
2323431
Various small fixes
achirkin Sep 28, 2022
e661e5c
More documentation
achirkin Sep 28, 2022
de6db27
Remove unrelated changes
achirkin Sep 28, 2022
1ae1471
Rename pq_width->pq_book_size and refactor PER_CLUSTER training
achirkin Sep 28, 2022
854c804
Factor out select_residuals
achirkin Sep 28, 2022
d1b006b
Factor out select_clusters function
achirkin Sep 29, 2022
1784f8b
Add a helper to select a worker function instance
achirkin Sep 29, 2022
8f514b6
Update docs
achirkin Sep 29, 2022
119619b
Use cub in preprocessing kernels and refactor the variable names
achirkin Sep 29, 2022
e8aeb71
refactor ivfpq_make_outputs into postprocess_distances and postproces…
achirkin Sep 29, 2022
96b48ad
Added some documentation to the tests and more debug-logging to ensur…
achirkin Sep 30, 2022
89e8327
Format the logging strings a bit
achirkin Sep 30, 2022
729b0e4
Add __launch_bounds__ everywhere to avoid occasional 'too many resour…
achirkin Sep 30, 2022
ea94db3
Factor out train_per_subset and train_per_cluster for readability
achirkin Sep 30, 2022
0f839fe
Factor out get_max_batch_size
achirkin Sep 30, 2022
9b64e55
Factor out kernel selection code
achirkin Sep 30, 2022
bf73cf5
Merge remote-tracking branch 'rapidsai/branch-22.10' into fea-knn-ivf-pq
achirkin Sep 30, 2022
6389eb7
Expand test coverage a little bit
achirkin Sep 30, 2022
cf1a196
Fix a cudaErrorIllegalAddress that can happen if the probed clusters …
achirkin Sep 30, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions cpp/include/raft/spatial/knn/detail/ann_utils.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,7 @@ inline void memzero(T* ptr, size_t n_elems, rmm::cuda_stream_view stream)
}
}

namespace {
__global__ void argmin_along_rows_kernel(uint32_t n_rows,
uint32_t n_cols,
const float* a,
Expand Down Expand Up @@ -185,6 +186,7 @@ __global__ void argmin_along_rows_kernel(uint32_t n_rows,
}
if (threadIdx.x == 0) { out[i] = shm_ids[0]; }
}
} // namespace

/**
* @brief Find index of the smallest element in each row.
Expand All @@ -209,6 +211,7 @@ inline void argmin_along_rows(
argmin_along_rows_kernel<<<n_rows, block_dim, 0, stream>>>(n_rows, n_cols, a, out);
}

namespace {
__global__ void dots_along_rows_kernel(uint32_t n_rows, uint32_t n_cols, const float* a, float* out)
{
uint64_t i = threadIdx.y + (blockDim.y * blockIdx.x);
Expand All @@ -226,6 +229,7 @@ __global__ void dots_along_rows_kernel(uint32_t n_rows, uint32_t n_cols, const f
sqsum += __shfl_xor_sync(0xffffffff, sqsum, 16);
if (threadIdx.x == 0) { out[i] = sqsum; }
}
} // namespace

/**
* @brief Square sum of values in each row (row-major matrix).
Expand Down Expand Up @@ -317,6 +321,7 @@ void accumulate_into_selected(size_t n_rows,
}
}

namespace {
__global__ void normalize_rows_kernel(uint32_t n_rows, uint32_t n_cols, float* a)
{
uint64_t i = threadIdx.y + (blockDim.y * blockIdx.x);
Expand All @@ -338,6 +343,7 @@ __global__ void normalize_rows_kernel(uint32_t n_rows, uint32_t n_cols, float* a
a[j + n_cols * i] *= sqsum;
}
}
} // namespace

/**
* @brief Divide rows by their L2 norm (square root of sum of squares).
Expand Down Expand Up @@ -557,4 +563,58 @@ void copy_selected(uint64_t n_rows,
default: RAFT_FAIL("All pointers must reside on the same side, host or device.");
}
}

template <typename T, typename S>
__global__ void copy_fill_kernel(uint32_t n_rows,
achirkin marked this conversation as resolved.
Show resolved Hide resolved
uint32_t n_cols,
const S* src,
uint32_t ld_src,
T* dst,
uint32_t ld_dst,
T fill_value)
{
uint64_t gid = threadIdx.x + blockDim.x * blockIdx.x;
uint64_t j = gid % ld_dst;
uint64_t i = gid / ld_dst;
if (i >= n_rows) return;
dst[j + ld_dst * i] = j < n_cols ? utils::mapping<T>{}(src[j + ld_src * i]) : fill_value;
}

/**
* @brief Copy a number of consecutive rows filling the longer output row endings with the given
* constant.
*
* @tparam T target type
* @tparam S source type
*
* @param n_rows
* @param n_cols
* @param[in] src input matrix [n_rows, ld_src]
* @param ld_src number of cols in the input (ld_src >= n_cols)
* @param[out] dst output matrix [n_rows, ld_dst]
* @param ld_dst number of cols in the output (ld_dst >= n_cols)
* @param fill_value the value to fill at column indices [n_cols, ... ld_dst)
* @param stream
*/
template <typename T, typename S>
void copy_fill(uint32_t n_rows,
achirkin marked this conversation as resolved.
Show resolved Hide resolved
uint32_t n_cols,
const S* src,
uint32_t ld_src,
T* dst,
uint32_t ld_dst,
T fill_value,
rmm::cuda_stream_view stream)
{
RAFT_EXPECTS(ld_src >= n_cols, "src leading dimension must be larger than n_cols");
RAFT_EXPECTS(ld_dst >= n_cols, "dist leading dimension must be larger than n_cols");
dim3 threads(128, 1, 1);
dim3 blocks(
ceildiv<uint64_t>(static_cast<uint64_t>(n_rows) * static_cast<uint64_t>(ld_dst), threads.x),
1,
1);
copy_fill_kernel<T, S>
<<<blocks, threads, 0, stream>>>(n_rows, n_cols, src, ld_src, dst, ld_dst, fill_value);
}

} // namespace raft::spatial::knn::detail::utils
70 changes: 21 additions & 49 deletions cpp/include/raft/spatial/knn/detail/ivf_flat_search.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

#include "../ivf_flat_types.hpp"
#include "ann_utils.cuh"
#include "topk/radix_topk.cuh"
#include "topk.cuh"
#include "topk/warpsort_topk.cuh"

#include <raft/common/device_loads_stores.cuh>
Expand Down Expand Up @@ -1133,29 +1133,16 @@ void search_impl(const handle_t& handle,
stream);

RAFT_LOG_TRACE_VEC(distance_buffer_dev.data(), std::min<uint32_t>(20, index.n_lists()));
if (n_probes <= raft::spatial::knn::detail::topk::kMaxCapacity) {
topk::warp_sort_topk<AccT, uint32_t>(distance_buffer_dev.data(),
nullptr,
n_queries,
index.n_lists(),
n_probes,
coarse_distances_dev.data(),
coarse_indices_dev.data(),
select_min,
stream,
search_mr);
} else {
topk::radix_topk<AccT, uint32_t, 11, 512>(distance_buffer_dev.data(),
nullptr,
n_queries,
index.n_lists(),
n_probes,
coarse_distances_dev.data(),
coarse_indices_dev.data(),
select_min,
stream,
search_mr);
}
select_topk<AccT, uint32_t>(distance_buffer_dev.data(),
nullptr,
n_queries,
index.n_lists(),
n_probes,
coarse_distances_dev.data(),
coarse_indices_dev.data(),
select_min,
stream,
search_mr);
RAFT_LOG_TRACE_VEC(coarse_indices_dev.data(), n_probes);
RAFT_LOG_TRACE_VEC(coarse_distances_dev.data(), n_probes);

Expand Down Expand Up @@ -1204,31 +1191,16 @@ void search_impl(const handle_t& handle,

// Merge topk values from different blocks
if (grid_dim_x > 1) {
if (k <= raft::spatial::knn::detail::topk::kMaxCapacity) {
topk::warp_sort_topk<AccT, IdxT>(refined_distances_dev.data(),
refined_indices_dev.data(),
n_queries,
k * grid_dim_x,
k,
distances,
neighbors,
select_min,
stream,
search_mr);
} else {
// NB: this branch can only be triggered once `ivfflat_interleaved_scan` above supports larger
// `k` values (kMaxCapacity limit as a dependency of topk::block_sort)
topk::radix_topk<AccT, IdxT, 11, 512>(refined_distances_dev.data(),
refined_indices_dev.data(),
n_queries,
k * grid_dim_x,
k,
distances,
neighbors,
select_min,
stream,
search_mr);
}
select_topk<AccT, IdxT>(refined_distances_dev.data(),
refined_indices_dev.data(),
n_queries,
k * grid_dim_x,
k,
distances,
neighbors,
select_min,
stream,
search_mr);
}
}

Expand Down
Loading