Replace k-means++ CPU bottleneck with a `random::discrete` prim #1039

Nyrio · 2022-11-22T15:11:30Z

Currently, k-means has a CPU bottleneck in its k-means++ initialization, especially when the dataset is tall and thin. At each step of the k-means++ initialization, the distances to the closest cluster centroid are copied to the host, and candidates centroids are selected using std::discrete_distribution. This distribution reduces and scans the weights, which is an expensive operation if the dataset has many rows, and should be done on GPU.

For the test of this new primitive, I use a small number of weights and a large number of samples and compare the actual vs expected histogram using a tolerance of 4*sigma where sigma is the standard deviation computed from the number of samples and the smallest non-zero weight. I don't know any good way to correctly test the primitive with a very large number of weights and a small number of samples, but I'm open to suggestions.

…splus-gpu

Nyrio · 2022-11-22T15:26:45Z

Sorry for the absolute mess in the diff and automatic labels / review requests, but branch-23.02 is behind branch-22.12 by 22 commits...

Nyrio · 2022-11-22T15:30:28Z

Reverting target branch for now to branch-22.12 to make reviews easier.

Nyrio · 2022-11-22T15:35:54Z

This is a figure of the before/after overall performance of k-means, with k-means++ initialization and a tall and thin dataset (note that the raft benchmark uses k-means|| initialization, which is apparently the default, for which no significant perf difference should be observed).

Nyrio · 2022-11-22T16:05:19Z

Nsight Systems timelines for comparison.

before

after

cjnolet · 2022-11-23T16:38:37Z

I don't know any good way to correctly test the primitive with a very large number of weights and a small number of samples, but I'm open to suggestions.

Could we do some little tricks here like setting most of the weights to be a very small value and then setting only a few weights to be much larger values and then comparing the unique set of sampled indices against the expected histogram to see that the unique list of indices that were sampled are all peaks in the expected histogram?

cjnolet

The changes look good, though aside from adding the test for the smaller sample sizes w/ larger weights, we should add a quick usage example for the docs.

cpp/include/raft/random/rng.cuh

cpp/include/raft/random/detail/rng_device.cuh

Nyrio · 2022-11-24T20:18:16Z

I have consolidated the tests with small sampled_len / large len.

cjnolet

LGTM. I'm pre-approving, but we will need to change the target branch to 23.02 before this is merged.

…splus-gpu

cjnolet · 2022-11-30T15:35:05Z

@gpucibot merge

Nyrio added 2 commits November 21, 2022 20:17

Add discrete distribution

ef19e05

Remove CPU bottleneck in k-means++

3b0ead9

Nyrio requested review from a team as code owners November 22, 2022 15:11

github-actions bot added CMake cpp labels Nov 22, 2022

Nyrio added 3 - Ready for Review improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 22, 2022

Nyrio changed the base branch from branch-22.12 to branch-23.02 November 22, 2022 15:22

Nyrio requested review from a team as code owners November 22, 2022 15:22

Merge remote-tracking branch 'origin/branch-22.12' into enh-kmeansplu…

417bbe8

…splus-gpu

github-actions bot added gpuCI python labels Nov 22, 2022

Nyrio changed the base branch from branch-23.02 to branch-22.12 November 22, 2022 15:29

cjnolet requested changes Nov 23, 2022

View reviewed changes

cpp/include/raft/random/rng.cuh Show resolved Hide resolved

cpp/include/raft/random/detail/rng_device.cuh Show resolved Hide resolved

Consolidate test

2474b62

Nyrio mentioned this pull request Nov 24, 2022

[PERF] Implement warp-collaborative binary search in sample_with_replacement_kernel #1047

Open

Add usage example

321f7aa

github-actions bot removed python gpuCI labels Nov 24, 2022

Nyrio requested a review from cjnolet November 24, 2022 20:19

cjnolet approved these changes Nov 29, 2022

View reviewed changes

cjnolet changed the base branch from branch-22.12 to branch-23.02 November 29, 2022 23:39

Merge remote-tracking branch 'origin/branch-23.02' into enh-kmeansplu…

0522ed9

…splus-gpu

rapids-bot bot merged commit 63a1d94 into rapidsai:branch-23.02 Nov 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace k-means++ CPU bottleneck with a `random::discrete` prim #1039

Replace k-means++ CPU bottleneck with a `random::discrete` prim #1039

Nyrio commented Nov 22, 2022 •

edited

Loading

Nyrio commented Nov 22, 2022

Nyrio commented Nov 22, 2022

Nyrio commented Nov 22, 2022

Nyrio commented Nov 22, 2022

cjnolet commented Nov 23, 2022

cjnolet left a comment

Nyrio commented Nov 24, 2022

cjnolet left a comment •

edited

Loading

cjnolet commented Nov 30, 2022

Replace k-means++ CPU bottleneck with a random::discrete prim #1039

Replace k-means++ CPU bottleneck with a random::discrete prim #1039

Conversation

Nyrio commented Nov 22, 2022 • edited Loading

Nyrio commented Nov 22, 2022

Nyrio commented Nov 22, 2022

Nyrio commented Nov 22, 2022

Nyrio commented Nov 22, 2022

cjnolet commented Nov 23, 2022

cjnolet left a comment

Choose a reason for hiding this comment

Nyrio commented Nov 24, 2022

cjnolet left a comment • edited Loading

Choose a reason for hiding this comment

cjnolet commented Nov 30, 2022

Replace k-means++ CPU bottleneck with a `random::discrete` prim #1039

Replace k-means++ CPU bottleneck with a `random::discrete` prim #1039

Nyrio commented Nov 22, 2022 •

edited

Loading

cjnolet left a comment •

edited

Loading