-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[igemm_dynamic] v4r1 bwd dynamic kernel #272
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carlushuang It looks like the code is approved for merge.
I have a few other questions related to perf
- Do you have perf comparison of resnext101/50 configs between bwd v4r1 dynamic and bwd v4r1
static? - What is current applicability criteria for bwd v4r1 dynamic vs v4r1 static? It looks like you are testing your kernel in CI with help of DYNAMIC_IMPLICITGEMM_COMMON and MIOPEN_DEBUG_FIND_ONLY_SOLVER
When do we plan to either not run static kernel and how do we plan to make that choice?
Hi @TejashShah . Currently for gfx906 fp32 the dynamic kernel is within 10% performance drop compared with static kernel. currently the support range of dynamic kernel is not as good as static kernel, I'm trying to balance my work with 1) extend support range 2) performance, and 3) gfx908. Currently I'm not testing the config from resnext101/50, if you wish I can do it offline. And for the criteria to use only dynamic kernel, at current stage I think it's based on 2 factors 1) perf drop within 10%. 2) coverage, at least cover high priority model. I'm trying to make my best to make both happen for the up-comming gfx908 PR, but if not, at least make sure what I can cover can match the 1) perf drop within 10%. |
@zjing14 Can you review this PR please. |
I concur with you on the approach. For initial iteration reduction project, we focus on vega20 first, then gfx908. That's good this PR is targeted for vega20. If I train any resnext/inception model end-to-end with this PR, would I see the compilation reduction, by default set of environment variables or do I need to set any specific env variables.? |
There is a PR:#326 that enable hybrid mode of igemm. I believe E2E run with dynamic kernel can be achieved in that mode. Because currently, the noticeable 10% perf drop of dynamic kernel may lead to it unpicked if run normally. |
namespace miopen { | ||
|
||
template <typename T> | ||
T gcd(T x, T y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚓ possibility of stack overflow, copy-pasta
v4r1 backward_data dynamic kernel.
This is to enable dynamic kernel in bwd direction, with v4r1 backward_data stratagy(gridwise_gemm). Although tensor contraction seems a MUST in dynamic kernels(due to its reduce in index calculation), let me re-add tensor-contraction version in future PR.