-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workaround for the failure of ConvHipImplicitGemmV4R4GenWrWXdlops #409
Conversation
Tested with configs from the issue. No performance drop after workaround and ConvHipImplicitGemmV4R4GenWrWXdlops still the fastest solver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -23,6 +23,8 @@ MIOPEN_DECLARE_ENV_VAR(MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_BLOCK_SYNC_LDS_WITHOUT_SY | |||
// LLVM xdlops instrinsic will do unnecessey VGRP <--> AGPR movement, and result in | |||
// register spill, for bfloat16 datatype, when doing wave-wise GEMM larger than 64x64 | |||
#define WORKAROUND_SWDEV_240356 1 | |||
// workaround failure of ConvHipImplicitGemmV4R4GenWrWXdlops with vector load | |||
#define WORKAROUND_ISSUE_2532 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Recommendation] Used only once in a .cpp? Difine right there.
Topmost comment & priority fixed. |
"No performance drop after workaround" That sounds wrong to me. In my tests, I've seen performance drops as high as 50%. |
@ekuznetsov139 Perhaps re-tuning is required. @zjing14 What do you think? |
@ekuznetsov139 Could you post the config with regression here? @atamazov No, I did not retune. |
MIOpenDriver convfp16 -n 256 -c 512 -H 28 -W 28 -k 128 -y 1 -x 1 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 4 -t 1 Without the fix: With the fix: The exact algorithm is |
@ekuznetsov139 Thanks. Sorry, I did not compare the performance of failed configs, since I think it does not fair. Yes, that is huge performance degradation. |
@zjing14 So, this is a WIP again? |
No, the PR is ready. |
@zjing14 What is the plan to reduce the impact of the performance regression? How widespread is this regression? |
@daniellowell This solver will be deprecated after the new wrw solver merged in. So, the impact is temporary. |
You were correct. The notion of performance is not applicable to the kernels that produce wrong outputs. However, currently find-db contains too optimistic information about |
Resolves issue #406