-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QST]Is this the complete set of valid parameters for performing fp16 matrix multiplication using tensor cores? #1304
Comments
No, they are not going to be a full set of parameters the API supports. Generally speaking, the set of all valid template parameters supported by any kernel is so huge due to combinatorial explosion that no amount of testing and cutlass library can generate all valid parameters. |
I have noticed that when performing matrix multiplication on the A100 machine, the computation speed for dimensions MNK set to 1024,150,256 and MNK set to 1024,1,256 is significantly slower compared to cublas. I have tried all the parameters listed on the following website, but I still can't match or exceed the performance of cublas. What should I do now? |
What needs to be added is that MNK means matrix multiplication of (M, K) * (N, K). |
Can you help me, please?@hwu36 |
@zwshan There are no expectations that CUTLASS should match or exceed cuBLAS performance. The intent of CUTLASS is to provide developers with an additional tool to cuBLAS to explore functionality and requirements not currently supported by our libraries. |
Also, you should not be using a GEMM kernel for a GEMV problem. We have a GEMV and batched GEMV implementation that are better suited for your problem shapes. |
you could use nsight or nvprof to get the kernel name used by cublas. the kernel name has the information of the tile sizes used. then we can fine tune cutlass from the same tile sizes used by cublas. |
thank you all! I will try it now! |
Could you please tell me how to use gemv kernel in sm80 A100 device? |
I want to use gemv kernel like this way using ElementOutput = float;
using ElementAccumulator = float;
using ElementComputeEpilogue = ElementAccumulator;
using RowMajor = cutlass::layout::RowMajor;
using ColumnMajor = cutlass::layout::ColumnMajor;
using EpilogueOp = cutlass::epilogue::thread::LinearCombination<
ElementOutput, // <- data type of output matrix
128 / cutlass::sizeof_bits<ElementOutput>::value, // <- This is the number of elements per vectorized memory access. For half precision, it's 8 elements. This becomes the vector width of math instructions in epilogue too
ElementAccumulator, // <- data type of accumulator
ElementComputeEpilogue>; // <- data type for alpha/beta in linear combination function
using CutlassGemm1 = cutlass::gemm::device::Gemm<
cutlass::tfloat32_t, // Data-type of A matrix
RowMajor, // Layout of A matrix
cutlass::tfloat32_t, // Data-type of B matrix
ColumnMajor, // Layout of B matrix
ElementOutput, // Data-type of C matrix
ColumnMajor, // Layout of C matrix , LayoutC = layout::ColumnMajor;
ElementAccumulator, // ElementAccumulator
cutlass::arch::OpClassTensorOp, // tag indicating Tensor Cores
cutlass::arch::Sm80, // tag indicating target GPU compute architecture
cutlass::gemm::GemmShape<64, 64, 32>,
cutlass::gemm::GemmShape<32, 32, 32>,
cutlass::gemm::GemmShape<16, 8, 8>,
cutlass::epilogue::thread::LinearCombination<
ElementOutput,
128 / cutlass::sizeof_bits<ElementOutput>::value,
ElementAccumulator,
ElementComputeEpilogue
>,
cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<>,
6
>;
CutlassGemm1 gemm_operator; |
here is cutlass gemv example: https://github.com/NVIDIA/cutlass/blob/main/test/unit/gemm/device/gemv.cu the code entrance is https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/device/gemv.h |
@zwshan has your issue been resolved? |
solved
thank you
Matthew Nicely ***@***.***>于2024年2月22日 周四23:11写道:
… @zwshan <https://github.com/zwshan> has your issue been resolved?
—
Reply to this email directly, view it on GitHub
<#1304 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AO2L3ITRU6NDFGCQOCCGCMTYU5NZBAVCNFSM6AAAAABB4JKZF6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJZGY2TINBTGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
What is your question?
In this website, there are many parameters, but may I ask if the parameters listed on this page are already all the valid ones?
The text was updated successfully, but these errors were encountered: