Skip to content

Commit

Permalink
fusedL2NN: Preventatively reduce shfl_sync width
Browse files Browse the repository at this point in the history
In the current implementation, it looks like values from different rows
are mixed together in what should be a row-wise warp reduce. All tests
do pass however.

Just in case, I have added a width parameter to the shuffle so that it
only shuffles within a row within the warp.
  • Loading branch information
ahendriksen committed Sep 5, 2022
1 parent 89cbb7f commit 87319de
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions cpp/include/raft/distance/detail/fused_l2_nn.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -202,8 +202,8 @@ __global__ __launch_bounds__(P::Nthreads, 2) void fusedL2NNkernel(OutT* min,
for (int i = 0; i < P::AccRowsPerTh; ++i) {
#pragma unroll
for (int j = P::AccThCols / 2; j > 0; j >>= 1) {
auto tmpkey = raft::shfl(val[i].key, lid + j);
auto tmpvalue = raft::shfl(val[i].value, lid + j);
auto tmpkey = raft::shfl(val[i].key, lid + j, P::AccThCols);
auto tmpvalue = raft::shfl(val[i].value, lid + j, P::AccThCols);
KVPair tmp = {tmpkey, tmpvalue};
val[i] = pairRed_op(accrowid + i * P::AccThRows + gridStrideY, tmp, val[i]);
}
Expand Down

0 comments on commit 87319de

Please sign in to comment.