Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] Can cutlass split warp tiles in the k dimension of gemmuniversal? #1152

Closed
Xseventh opened this issue Oct 19, 2023 · 7 comments
Closed

Comments

@Xseventh
Copy link
Contributor

I am trying to tune cutlass gemm_with_broadcast kernel. I set block tile [128, 128, 64] and warp tile [64, 64, 32] and set epilogue beta is 1 for read source tensor, I encountered an error of illegal memory access. Is this because it is incorrect and causes the expected error? Will there be similar problems with setting tile in gemm without broadcast?

@Xseventh
Copy link
Contributor Author

Not only this kind of tile division will cause illegal memory access errors. I observed that all the tiles that reported errors are warm tile k is not equal with block tile k. Can you explain the behavior of cutlass when block tile k is different from warp tile k?

@hwu36
Copy link
Collaborator

hwu36 commented Oct 19, 2023

if warp tile k is smaller than block tile k, they will use slice k. we have unit tests for them and some sizes are enabled in the profiler. however, we don't observe much benefits on ampere when using slice k.

@Xseventh
Copy link
Contributor Author

thanks! It seems that this error is unexpected. I will continue to check the cause of the illegal memory error.

@Xseventh
Copy link
Contributor Author

if warp tile k is smaller than block tile k, they will use slice k. we have unit tests for them and some sizes are enabled in the profiler. however, we don't observe much benefits on ampere when using slice k.

I understand that the split-k within the same block can be reduced in shared memory. I also see another type of split-k that reduces in global memory across different blocks. Did I understand these two types of split-k methods correctly? Would the former split-k method have better performance than the latter?

@hwu36
Copy link
Collaborator

hwu36 commented Oct 19, 2023

the latter is useful when m/n is small and k is very big.

the former is only useful in some very corner cases.

Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@mnicely
Copy link
Collaborator

mnicely commented Jan 2, 2024

Closing due to inactivity. Feel free to reopen if needed.

@mnicely mnicely closed this as completed Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants