-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very slow 4x4 convolutions on gfx803 #134
Comments
Which version of miopen do you use? |
@huanzhang12 AFAICS you are using 2.2.0. Version 2.3.0 is just released. It includes c58488b that should restore gfx8 performance. Please close this if the issue is resolved. |
@atamazov I tried the just released version 2.3.0 and it is amazing! It is great news that ASM kernels are re-enabled on gfx803. The same 4x4 convolution runs at 1684 GFLOPs:
My workload involving some 4x4 convolutions runs 10 times faster on v2.3.0. Thank you so much for the hard work and I am closing this issue. |
Since the ASM kernels were disabled on gfx803 in commit ce51a4c, 4x4 convolutions on gfx803 default to the very slow gemm algorithm:
Before ASM kernels were disabled, it was much faster:
The performance reduces from 694 GFLOPs to 15 GFLOPs.
I am wondering why all ASM kernels were disabled for gfx803 instead of disabling individual problematic ones?
Also, even without an ASM implementation, can we use a general OpenCL implementation in this case rather than rely on the extremely slow GEMM? (It seems
conv_ocl_dir2Dfwd.cpp
is not enabled for most 4x4 convolutions)The text was updated successfully, but these errors were encountered: