Integrate `accumulate_into_selected` from ANN utils into `linalg::reduce_rows_by_keys` #909

Nyrio · 2022-10-10T12:26:01Z

accumulate_into_selected achieves much better performance than the previous implementation of reduce_rows_by_keys for large nkeys (sum_rows_by_key_large_nkeys_kernel_rowmajor). According to the benchmark that I added for this primitive, the difference is a factor of 240x for sizes relevant to IVF-Flat (and a factor of ~10x for smaller nkeys, e.g 64).

This is mostly because the legacy implementation, probably in an attempt to reduce atomic conflicts, assigned a key and a tile of the matrix to each block, and the block only reduces the rows corresponding to the assigned key. With a very large number of keys, e.g 1k, this results in blocks iterating over a large number of rows (possibly tens of thousands) and only reading and accumulating 1 in 1k rows.

This PR:

Replaces sum_rows_by_key_large_nkeys_rowmajor with accumulate_into_selected (I didn't find any cases in which the old kernel performed better).
Removes accumulate_into_selected from ann_utils.cuh.
Fixes support for custom iterators in reduce_rows_by_keys.
Uses the raft prims in calc_centers_and_sizes.

Perf notes:

The original kmeans gets a 15-20% speedup for large numbers of clusters.
The performance of ivf_flat::build stays the same as before.
There are a bunch of extra steps since I separated the cluster size count from the reduction by key, but they are quite neglectable in comparison.

Question: the change breaks support for host-side-only arrays in calc_centers_and_sizes, is it actually a possibility? Should I add a branch and not use the raft prims when all arrays are host-side?

cc @achirkin @tfeher @cjnolet

…ulate

cjnolet

Thanks again @Nyrio for this consolidation and optimization! Minor things again

cpp/bench/CMakeLists.txt

cjnolet · 2022-10-13T01:41:02Z

cpp/include/raft/spatial/knn/detail/ann_utils.cuh

@@ -383,6 +317,8 @@ __global__ void map_along_rows_kernel(
 * @brief Map a binary function over a matrix and a vector element-wise, broadcasting the vector
 * values along rows: `m[i, j] = op(m[i,j], v[i])`
 *
+ * @todo(lsugy): replace with matrix_vector_op


Can we add a GitHub issue for this and reference it here just to make sure we are tracking it?

Oops I didn't mean to leave that comment here, but I already have a WIP PR for this: #911

cjnolet · 2022-10-13T01:52:00Z

cpp/include/raft/linalg/reduce_rows_by_key.cuh

-                        int nkeys,
-                        DataIteratorT* d_sums,
-                        cudaStream_t stream)
+                        IdxT nrows,


We are going to be deprecating the raw pointer APIs soon in favor of the new (more self-documenting) mdspan APIs. Do you see any reasons why we should prefer to keep the iterator-based APIs over the mdspan APIs?

Iterator-based APIs combined with fancy iterators such as cub::TransformInputIterator avoid unnecessary steps when the input needs to be converted (e.g int to float mapping, key-value to key-only or value-only, etc). This comes at the expense of more template instantiations and some optimizations that are only possible with raw pointers (though we can use if constexpr to account for that).

Yeah, I get the point of the "streamability" of the iterators vs raw pointers. Given that the mdspan is really just a very lightweight wrapper around any type (it can be but doesn't have to be a raw pointer) and we're just forwarding the underlying data_handle() to the function in the detail namespace, can't we also wrap the mdspan around an iterator?

mdspan is really just a very lightweight wrapper around any type (it can be but doesn't have to be a raw pointer)

Not exactly. As far as I understand, span or mdspan can only wrap around data that exists in memory (and continuously). std::span even has a member data which returns a pointer to the memory location of the first element. So it can wrap around simple iterators like vector::begin but not cub::TransformInputIterator.

mdspan and span are a little different in both form and function. we have been investing in mdspan more than span for our public APIs. mdspan itself does not require contiguity, though we do enforce that in many of the public APIs for the simple reason that we are using it as a facade for raw pointers. If we are expecting iterators, we could relax that constraint a bit when the underlying data_handle() is in fact an iterator. Also, @mhoemmen can correct me if I'm wrong but I don't believe the data backing an mdspan does need to exist in memory- I think it could even be a file pointer or a pointer to a remote data buffer so long as the accessor knows how to materialize it and it allows for random access.

Im not trying to discourage the use of iterators in our public API and I definitely see the benefits of using them to apply functors lazily upon materialization to avoid copies / additional allocations. I'm hoping we can incorporate acceptance of iterators more consistently and broadly across our APIs and avoid having a only a couple functions throughout the codebase that accept iterators while most do not.

I'm also okay accepting this PR and adding a todo to figure out how we can accept iterator types more broadly across the new APIs so we can consolidate the dev experience a bit for our end-users.

Oh, I thought of mdspan as a multi-dimensional std::span but it seems that it's a bit more versatile.
So, to make sure I understand: each custom iterator would require a different accessor? And the mdspan type for iterator-accepting args will still be a template parameter, while args expecting raw pointers can use matrix or vector views?

Also, @mhoemmen can correct me if I'm wrong but I don't believe the data backing an mdspan does need to exist in memory- I think it could even be a file pointer or a pointer to a remote data buffer so long as the accessor knows how to materialize it and it allows for random access.

@cjnolet is correct (as usual!). We say "mdspan doesn't need a backing span."

Elements don't need to exist in memory;

data_handle_type doesn't need to be ElementType*;

even if it is, or even if data_handle_type presents the syntax of an iterator, it doesn't need to act like one;

reference doesn't need to be element_type&; and

access(p, i) doesn't need to return p[i].

One example of (3) is MPI_Win (which could be void*, but it's not a pointer to an array), a handle to remote memory.

My personal view is that mdspan is a multipass multidimensional iterator. That is, it's the preferred interface for viewing a multidimensional range.

If you really want an iterator range for a generic rank-1 mdspan x, std::views::iota(0, x.extent(0)) | std::views::transform([x] (auto index) { /* function using x[index] */ }) works fine.

Also, it looks like all of RAFT's rank-1 mdspan would let one use data_handle(), data_handle() + extent(0) as an iterator range.

cjnolet · 2022-10-13T01:54:00Z

cpp/include/raft/spatial/knn/detail/ann_kmeans_balanced.cuh

+  cub::TransformInputIterator<float, utils::mapping<float>, const T*> mapping_itr(dataset,
+                                                                                  mapping_op);
+
+  // todo(lsugy): use iterator from KV output of fusedL2NN


Can you create an issue for this and reference it here for tracking? Do you see any reason it would be more beneficial to use iterators here over the mdspan API?

The point of using an iterator here would be to avoid having one extra step after fusedL2NN to extract and cast the key (fusedL2NN can output key-value, value-only, but not key-only).

tfeher

Thanks @Nyrio for the PR! I confirm that there is no need to handle host only pointers in calc_centers_and_sizes: only historically was that necessary, now all the calculation is done in device or managed memory for the IVF methods. Some nitpicks below, otherwise it looks good.

cpp/include/raft/linalg/detail/reduce_rows_by_key.cuh

cpp/include/raft/spatial/knn/detail/ann_kmeans_balanced.cuh

…ulate

Nyrio · 2022-10-17T13:32:40Z

Thanks @tfeher for confirming that!

tfeher

Thanks Louis for the update, it looks good to me!

…parison warning

Nyrio · 2022-10-19T12:58:21Z

@cjnolet If that's ok with you, can we merge this and mdspanify later? mdspanifying custom iterators requires helpers and types that we don't have yet.

cjnolet · 2022-10-19T13:39:16Z

@Nyrio yeah I think we can push that change off until later. Can you create a quick GitHub issue for it so that it doesn't get lost?

cjnolet · 2022-10-19T14:38:01Z

@gpucibot merge

cpp/include/raft/spatial/knn/detail/ann_kmeans_balanced.cuh

Nyrio added 3 commits October 7, 2022 18:35

Integrate accumulate_into_selected into raft prims

7f11225

Remove accumulate_into_selected

529931d

Merge remote-tracking branch 'origin/branch-22.12' into enh-ann-accum…

0f284b0

…ulate

Nyrio requested review from a team as code owners October 10, 2022 12:26

github-actions bot added CMake cpp labels Oct 10, 2022

Nyrio added 3 - Ready for Review improvement Improvement / enhancement to an existing function non-breaking Non-breaking change CMake and removed CMake labels Oct 10, 2022

Nyrio mentioned this pull request Oct 10, 2022

Replace map_along_rows with matrixVectorOp #911

Merged

cjnolet requested changes Oct 13, 2022

View reviewed changes

tfeher requested changes Oct 17, 2022

View reviewed changes

cpp/include/raft/linalg/detail/reduce_rows_by_key.cuh Outdated Show resolved Hide resolved

cpp/include/raft/spatial/knn/detail/ann_kmeans_balanced.cuh Outdated Show resolved Hide resolved

Nyrio added 2 commits October 17, 2022 15:30

Cleanup comment

50bc6d7

Merge remote-tracking branch 'origin/branch-22.12' into enh-ann-accum…

9410fa6

…ulate

Doxygen fix + unsigned default value fix

c15ff70

Nyrio requested a review from tfeher October 17, 2022 13:42

tfeher approved these changes Oct 17, 2022

View reviewed changes

Use signed index type with countLabels due to cub signed-unsigned com…

71e5c84

…parison warning

Nyrio mentioned this pull request Oct 19, 2022

[ENH] mdspan-ify iterator-based API for linalg::reduce_rows_by_keys #925

Open

cjnolet approved these changes Oct 19, 2022

View reviewed changes

mhoemmen reviewed Oct 19, 2022

View reviewed changes

cpp/include/raft/spatial/knn/detail/ann_kmeans_balanced.cuh Outdated Show resolved Hide resolved

C++-style casts

fbb22fe

mhoemmen reviewed Oct 19, 2022

View reviewed changes

cpp/include/raft/spatial/knn/detail/ann_kmeans_balanced.cuh Show resolved Hide resolved

rapids-bot bot merged commit 0de9ece into rapidsai:branch-22.12 Oct 19, 2022

tfeher mentioned this pull request Oct 27, 2022

[FEA] Integrate balanced KMeans #700

Open

10 tasks

Nyrio mentioned this pull request Jan 18, 2023

[ENH] IVF-* ANN post-integration TODOs #711

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate `accumulate_into_selected` from ANN utils into `linalg::reduce_rows_by_keys` #909

Integrate `accumulate_into_selected` from ANN utils into `linalg::reduce_rows_by_keys` #909

Nyrio commented Oct 10, 2022 •

edited

Loading

cjnolet left a comment

cjnolet Oct 13, 2022

Nyrio Oct 13, 2022

cjnolet Oct 13, 2022

Nyrio Oct 13, 2022

cjnolet Oct 14, 2022

Nyrio Oct 17, 2022

cjnolet Oct 17, 2022

cjnolet Oct 17, 2022

Nyrio Oct 17, 2022

mhoemmen Oct 17, 2022

mhoemmen Oct 17, 2022

mhoemmen Oct 17, 2022

cjnolet Oct 13, 2022

Nyrio Oct 13, 2022

tfeher left a comment

Nyrio commented Oct 17, 2022

tfeher left a comment

Nyrio commented Oct 19, 2022

cjnolet commented Oct 19, 2022

cjnolet commented Oct 19, 2022

Integrate accumulate_into_selected from ANN utils into linalg::reduce_rows_by_keys #909

Integrate accumulate_into_selected from ANN utils into linalg::reduce_rows_by_keys #909

Conversation

Nyrio commented Oct 10, 2022 • edited Loading

cjnolet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tfeher left a comment

Choose a reason for hiding this comment

Nyrio commented Oct 17, 2022

tfeher left a comment

Choose a reason for hiding this comment

Nyrio commented Oct 19, 2022

cjnolet commented Oct 19, 2022

cjnolet commented Oct 19, 2022

Integrate `accumulate_into_selected` from ANN utils into `linalg::reduce_rows_by_keys` #909

Integrate `accumulate_into_selected` from ANN utils into `linalg::reduce_rows_by_keys` #909

Nyrio commented Oct 10, 2022 •

edited

Loading