-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QST] Is there any INT8 GEMM with INT8 alpha and beta? #1157
Comments
just using 8 bit alpha/beta is not going to make performance difference. |
@jhss is your question resolved? |
I want to know why 8 bit int alpha/beta doesn't effect performance |
because shaving off 4 bytes to 1 byte for a single load per tile does not change the perf at all. Changing fp32 multiplication to int8 will also not move the needle too much in the grand scheme of things.. What is your problem size you are most interested in? |
Thank you for answering. I'm looking at smoothquant repository, they use matrix multiplication whose sizes are about
Suppose accumulator shape is |
Although I doubt it, you can certainly try int8 alpha/beta to see if it would help in this case. What you would have to do is modify the epilogue thread functor's |
@jhss is your question resolved? |
This issue has been labeled |
What is your question?
In the code above, if I set alpha and beta as INT8, I got warning that
narrowing conversion from int to float
.Does alpha and beta have to be float? I want to set it as INT8 to increase inference speed.
The text was updated successfully, but these errors were encountered: