Skip to content

Commit

Permalink
Speedup make_blobs by up to 2x by fixing inefficient kernel launch …
Browse files Browse the repository at this point in the history
…configuration (#1100)

The kernel generates two elements per iteration and attempts to write the second element with an offset equal to the grid stride. However, the grid stride is currently computed to be greater than the length of the generated array, so this second value is never used. By using a grid stride of half the array size, we speed up the kernel by nearly 2x in some cases (see perf charts in the PR comments).

_Note: this will effectively modify many test inputs, so be aware of that when comparing results prior to and following the change._

Authors:
  - Louis Sugy (https://github.com/Nyrio)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1100
  • Loading branch information
Nyrio authored Dec 14, 2022
1 parent 0039f33 commit 51c45b0
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 2 deletions.
10 changes: 10 additions & 0 deletions cpp/bench/random/make_blobs.cu
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,12 @@ struct make_blobs_inputs {
bool row_major;
}; // struct make_blobs_inputs

inline auto operator<<(std::ostream& os, const make_blobs_inputs& p) -> std::ostream&
{
os << p.rows << "#" << p.cols << "#" << p.clusters << "#" << p.row_major;
return os;
}

template <typename T>
struct make_blobs : public fixture {
make_blobs(const make_blobs_inputs& p)
Expand All @@ -34,6 +40,10 @@ struct make_blobs : public fixture {

void run_benchmark(::benchmark::State& state) override
{
std::ostringstream label_stream;
label_stream << params;
state.SetLabel(label_stream.str());

loop_on_state(state, [this]() {
raft::random::make_blobs(data.data(),
labels.data(),
Expand Down
6 changes: 4 additions & 2 deletions cpp/include/raft/random/detail/make_blobs.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -156,8 +156,10 @@ void generate_data(DataT* out,
const DataT cluster_std_scalar,
raft::random::RngState& rng_state)
{
IdxT items = n_rows * n_cols;
IdxT nBlocks = (items + 127) / 128;
constexpr IdxT block_size = 128;
IdxT items = n_rows * n_cols;
// Choose a grid size so that each thread can write two output values.
IdxT nBlocks = ceildiv<IdxT>(items, 2 * block_size);
// parentheses needed here for kernel, otherwise macro interprets the arguments
// of triple chevron notation as macro arguments
RAFT_CALL_RNG_FUNC(rng_state,
Expand Down

0 comments on commit 51c45b0

Please sign in to comment.