-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QST] Can cutlass split warp tiles in the k dimension of gemmuniversal? #1152
Comments
Not only this kind of tile division will cause illegal memory access errors. I observed that all the tiles that reported errors are warm tile k is not equal with block tile k. Can you explain the behavior of cutlass when block tile k is different from warp tile k? |
if warp tile k is smaller than block tile k, they will use slice k. we have unit tests for them and some sizes are enabled in the profiler. however, we don't observe much benefits on ampere when using slice k. |
thanks! It seems that this error is unexpected. I will continue to check the cause of the illegal memory error. |
I understand that the split-k within the same block can be reduced in shared memory. I also see another type of split-k that reduces in global memory across different blocks. Did I understand these two types of split-k methods correctly? Would the former split-k method have better performance than the latter? |
the latter is useful when m/n is small and k is very big. the former is only useful in some very corner cases. |
This issue has been labeled |
Closing due to inactivity. Feel free to reopen if needed. |
I am trying to tune cutlass gemm_with_broadcast kernel. I set block tile [128, 128, 64] and warp tile [64, 64, 32] and set epilogue beta is 1 for read source tensor, I encountered an error of illegal memory access. Is this because it is incorrect and causes the expected error? Will there be similar problems with setting tile in gemm without broadcast?
The text was updated successfully, but these errors were encountered: