[QST] How to choose a proper ValLayout? #1242

mammoth831 · 2023-12-06T03:05:17Z

cutlass/test/unit/gemm/device/default_gemm_configuration.hpp

Lines 166 to 172 in a75b4ac

    
           using TileShape = Shape<_128, _128, _32>; 
        
           static constexpr int ThreadCount = 128; 
        
           using DispatchPolicy = MainloopSm80CpAsync<3>; 
        
           using TiledMma = TiledMMA< 
        
               MMA_Atom<SM80_16x8x16_F32F16F16F32_TN>, 
        
               Layout<Shape<_2,_2,_1>>,  // 2x2x1 thread group 
        
               Layout<Shape<_1,_2,_1>>>; // 1x2x1 value group for 16x16x16 MMA and LDSM

For a kernel with TileShape 128x128x32, the TiledMMA uses 2x2x1 AtomLayout and 1x2x1 ValLayout, with corresponding 32x32x16 TiledShape_MNK in TiledMMA. Then I think cute will use the TiledMMA to tile 128x128x32.

Can we just set the ValLayout to 4x8x1 to make TiledShape_MNK 128x128x16? Then for M- and N- extent, we can process them "once".

And could you please give some suggestions on how to choose a proper ValLayout?

The text was updated successfully, but these errors were encountered:

ccecka · 2023-12-06T16:34:54Z

You can absolutely set ValLayout to 4x8x1 to get a 128x128x16 MMA. With print_latex(tiled_mma), this can be very helpful for visualization.

However, the MMA partitioner will still partition any MxN tensor into a (MMA, MMA_M, MMA_N) tensor where the MMA-mode is for a single instruction and MMA_M and MMA_N modes are the number of instructions in M and N respectively. Thus, ValLayout won't actually affect the partitioning. It is actually being removed as a parameter in the next revision because it only interacts with the Permutation parameter in an unintuitive way and is difficult to explain. Feel free to ignore it.

mammoth831 · 2023-12-07T03:20:28Z

You can absolutely set ValLayout to 4x8x1 to get a 128x128x16 MMA. With print_latex(tiled_mma), this can be very helpful for visualization.

However, the MMA partitioner will still partition any MxN tensor into a (MMA, MMA_M, MMA_N) tensor where the MMA-mode is for a single instruction and MMA_M and MMA_N modes are the number of instructions in M and N respectively. Thus, ValLayout won't actually affect the partitioning. It is actually being removed as a parameter in the next revision because it only interacts with the Permutation parameter in an unintuitive way and is difficult to explain. Feel free to ignore it.

Thank you. Looking forward to the new TiledMMA interface and docs.

mammoth831 added ? - Needs Triage question Question labels Dec 6, 2023

mnicely added the CuTe CuTe Functionality label Dec 6, 2023

mammoth831 closed this as completed Dec 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] How to choose a proper ValLayout? #1242

[QST] How to choose a proper ValLayout? #1242

mammoth831 commented Dec 6, 2023

ccecka commented Dec 6, 2023

mammoth831 commented Dec 7, 2023

[QST] How to choose a proper ValLayout? #1242

[QST] How to choose a proper ValLayout? #1242

Comments

mammoth831 commented Dec 6, 2023

ccecka commented Dec 6, 2023

mammoth831 commented Dec 7, 2023