Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: batch decode kernel redundant store output to gmem (#505)
Hi, this is a minor fix, when bdz is greater than 1, there would be redundant store to gmem operations for some warps. We may also check 'if (tx == 0)' when storing lse value, but since bdx is 32 most of the time, I think that would be fine. Co-authored-by: tsu-bin <[email protected]>
- Loading branch information