Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] How to choose a proper ValLayout? #1242

Closed
mammoth831 opened this issue Dec 6, 2023 · 2 comments
Closed

[QST] How to choose a proper ValLayout? #1242

mammoth831 opened this issue Dec 6, 2023 · 2 comments
Labels
? - Needs Triage CuTe CuTe Functionality question Question

Comments

@mammoth831
Copy link

Hi, @ccecka

using TileShape = Shape<_128, _128, _32>;
static constexpr int ThreadCount = 128;
using DispatchPolicy = MainloopSm80CpAsync<3>;
using TiledMma = TiledMMA<
MMA_Atom<SM80_16x8x16_F32F16F16F32_TN>,
Layout<Shape<_2,_2,_1>>, // 2x2x1 thread group
Layout<Shape<_1,_2,_1>>>; // 1x2x1 value group for 16x16x16 MMA and LDSM

For a kernel with TileShape 128x128x32, the TiledMMA uses 2x2x1 AtomLayout and 1x2x1 ValLayout, with corresponding 32x32x16 TiledShape_MNK in TiledMMA. Then I think cute will use the TiledMMA to tile 128x128x32.

Can we just set the ValLayout to 4x8x1 to make TiledShape_MNK 128x128x16? Then for M- and N- extent, we can process them "once".

And could you please give some suggestions on how to choose a proper ValLayout?

@mnicely mnicely added the CuTe CuTe Functionality label Dec 6, 2023
@ccecka
Copy link

ccecka commented Dec 6, 2023

You can absolutely set ValLayout to 4x8x1 to get a 128x128x16 MMA. With print_latex(tiled_mma), this can be very helpful for visualization.

However, the MMA partitioner will still partition any MxN tensor into a (MMA, MMA_M, MMA_N) tensor where the MMA-mode is for a single instruction and MMA_M and MMA_N modes are the number of instructions in M and N respectively. Thus, ValLayout won't actually affect the partitioning. It is actually being removed as a parameter in the next revision because it only interacts with the Permutation parameter in an unintuitive way and is difficult to explain. Feel free to ignore it.

@mammoth831
Copy link
Author

You can absolutely set ValLayout to 4x8x1 to get a 128x128x16 MMA. With print_latex(tiled_mma), this can be very helpful for visualization.

However, the MMA partitioner will still partition any MxN tensor into a (MMA, MMA_M, MMA_N) tensor where the MMA-mode is for a single instruction and MMA_M and MMA_N modes are the number of instructions in M and N respectively. Thus, ValLayout won't actually affect the partitioning. It is actually being removed as a parameter in the next revision because it only interacts with the Permutation parameter in an unintuitive way and is difficult to explain. Feel free to ignore it.

Thank you. Looking forward to the new TiledMMA interface and docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage CuTe CuTe Functionality question Question
Projects
None yet
Development

No branches or pull requests

3 participants