Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Sparse GEMM seems to produce wrong results for F32 datatype and specific inputs #1270

Closed
alexsamardzic opened this issue Dec 13, 2023 · 3 comments
Labels
? - Needs Triage bug Something isn't working

Comments

@alexsamardzic
Copy link
Contributor

Tried to change 15_ampere_sparse_tensorop_gemm example, in order to test sparse GEMM for F32 inputs, and specific m, n, k values. Here is the
diff of my changes, very simple and everything works fine: throughout several runs, example code reports that results of sparse GEMM and reference dense GEMM match.

However, when changed to particular values for tensor_a, tensor_b and tensor_e, instead of ones generated by example, the comparison with reference result fails. Here is the full changed example
source file (please rename to .cu). The file is rather big as specific values for mentioned tensors are put inline but besides that, changes are again minimal on top of the above mentioned diff: these specific values are applied to corresponding tensors through std::copy, and I'm also double checking that 16-bit values provided for meta tensor contain only 0x4 and 0xE quad-bits, as this should be the only limitation regarding specific values provided for mentioned tensors. The example will fail in this case, and I've added also a printout showing that the difference between specific element of sparse GEMM result and reference result is quite big - it's -1.26617 vs. -0.67898.

@hwu36
Copy link
Collaborator

hwu36 commented Dec 14, 2023

could you try to use small int like [-3, 3] as the input? we don't have true fp32 tensor cores. we use tf32 to compute. tf32 has only 10 explicit mantissa bits, 14 less than true fp32. fp32 are converted to tf32 first before calling tensor cores.

@hwu36
Copy link
Collaborator

hwu36 commented Dec 14, 2023

moreover, try to use 64x32x32 warp tile size if you use 128x64x32 threadblock tile size.

@alexsamardzic
Copy link
Contributor Author

Thanks, if dense GEMM force to tf32 then results indeed match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants