[PERF] Implement warp-collaborative binary search in sample_with_replacement_kernel
#1047
Labels
feature request
New feature or request
sample_with_replacement_kernel
#1047
#1039 added a kernel to sample with replacement by doing a binary search on the cumulative sum of weights. However, when the number of elements to sample is very small and the range to sample from (i.e number of weights) is very large, this is not optimal. That's the case for the current use in k-means++ initialization. But this kernel only accounts for a fraction of the total execution time and is likely limited by launch latency, making this a low-priority item.
The text was updated successfully, but these errors were encountered: