Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fusedL2NN: Preventatively reduce shfl_sync width
In the current implementation, it looks like values from different rows are mixed together in what should be a row-wise warp reduce. All tests do pass however. Just in case, I have added a width parameter to the shuffle so that it only shuffles within a row within the warp.
- Loading branch information