[QST] why kElementsPerAccess > 1 is not permanent in default_mma_sm80_core.h #2011

danielhua23 · 2024-12-23T12:33:07Z

What is your question?
Dear cutlass team,

I found recently that kElementsPerAccess > 1 that will pass in ThreadMap is not permanent, and with some layout, kElementsPerAccess = 1, just wondering the reason we designed like this. Is it because we can't promise the runtime input problem size(for example, input shape is [57,35]) is divisible by 128/sizeof_bit<Element> and we have to set the kElementsPerAccess = 1?
the code is located in https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/threadblock/default_mma_core_sm80.h#L1864 whose kElementsPerAccess > 1 and https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/threadblock/default_mma_core_sm80.h#L2003 whose kElementsPerAccess = 1

Thanks a ton for your time!

The text was updated successfully, but these errors were encountered:

danielhua23 · 2024-12-23T12:33:50Z

cc @jackkosaian and @hwu36

hwu36 · 2024-12-23T16:58:46Z

Yes, it is related to the alignment. can_implement in the kernel level chech these. Usually, we want simt kernel alignment to be 1 to match cublas behavior. As to tensor core kernels, we want alignment as big as possible as long as the problem size allowed.

danielhua23 · 2024-12-24T02:32:36Z

thanks to your answer, I got your point!

danielhua23 added ? - Needs Triage question Question labels Dec 23, 2024

danielhua23 changed the title ~~[QST] why kElementsPerAccess > 1only when A and B are interleaved layout in default_mma_sm80_core.h~~ [QST] why kElementsPerAccess > 1 is not permanent in default_mma_sm80_core.h Dec 23, 2024

danielhua23 closed this as completed Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] why kElementsPerAccess > 1 is not permanent in default_mma_sm80_core.h #2011

[QST] why kElementsPerAccess > 1 is not permanent in default_mma_sm80_core.h #2011

danielhua23 commented Dec 23, 2024 •

edited

Loading

danielhua23 commented Dec 23, 2024

hwu36 commented Dec 23, 2024

danielhua23 commented Dec 24, 2024

[QST] why kElementsPerAccess > 1 is not permanent in default_mma_sm80_core.h #2011

[QST] why kElementsPerAccess > 1 is not permanent in default_mma_sm80_core.h #2011

Comments

danielhua23 commented Dec 23, 2024 • edited Loading

danielhua23 commented Dec 23, 2024

hwu36 commented Dec 23, 2024

danielhua23 commented Dec 24, 2024

danielhua23 commented Dec 23, 2024 •

edited

Loading