Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[igemm_dynamic] v4r1 bwd dynamic kernel #272

Merged
merged 54 commits into from
Jul 27, 2020
Merged

Conversation

carlushuang
Copy link
Contributor

@carlushuang carlushuang commented Jun 9, 2020

v4r1 backward_data dynamic kernel.
This is to enable dynamic kernel in bwd direction, with v4r1 backward_data stratagy(gridwise_gemm). Although tensor contraction seems a MUST in dynamic kernels(due to its reduce in index calculation), let me re-add tensor-contraction version in future PR.

carlushuang and others added 30 commits April 19, 2020 15:10
DrizztDoUrden
DrizztDoUrden previously approved these changes Jul 21, 2020
Copy link
Contributor

@DrizztDoUrden DrizztDoUrden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@TejashShah TejashShah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlushuang It looks like the code is approved for merge.

I have a few other questions related to perf

  • Do you have perf comparison of resnext101/50 configs between bwd v4r1 dynamic and bwd v4r1
    static?
  • What is current applicability criteria for bwd v4r1 dynamic vs v4r1 static? It looks like you are testing your kernel in CI with help of DYNAMIC_IMPLICITGEMM_COMMON and MIOPEN_DEBUG_FIND_ONLY_SOLVER
    When do we plan to either not run static kernel and how do we plan to make that choice?

@carlushuang
Copy link
Contributor Author

@carlushuang It looks like the code is approved for merge.

I have a few other questions related to perf

  • Do you have perf comparison of resnext101/50 configs between bwd v4r1 dynamic and bwd v4r1
    static?
  • What is current applicability criteria for bwd v4r1 dynamic vs v4r1 static? It looks like you are testing your kernel in CI with help of DYNAMIC_IMPLICITGEMM_COMMON and MIOPEN_DEBUG_FIND_ONLY_SOLVER
    When do we plan to either not run static kernel and how do we plan to make that choice?

Hi @TejashShah . Currently for gfx906 fp32 the dynamic kernel is within 10% performance drop compared with static kernel. currently the support range of dynamic kernel is not as good as static kernel, I'm trying to balance my work with 1) extend support range 2) performance, and 3) gfx908. Currently I'm not testing the config from resnext101/50, if you wish I can do it offline. And for the criteria to use only dynamic kernel, at current stage I think it's based on 2 factors 1) perf drop within 10%. 2) coverage, at least cover high priority model. I'm trying to make my best to make both happen for the up-comming gfx908 PR, but if not, at least make sure what I can cover can match the 1) perf drop within 10%.

@daniellowell
Copy link
Contributor

@zjing14 Can you review this PR please.

@TejashShah
Copy link
Contributor

TejashShah commented Jul 22, 2020

@carlushuang It looks like the code is approved for merge.
I have a few other questions related to perf

  • Do you have perf comparison of resnext101/50 configs between bwd v4r1 dynamic and bwd v4r1
    static?
  • What is current applicability criteria for bwd v4r1 dynamic vs v4r1 static? It looks like you are testing your kernel in CI with help of DYNAMIC_IMPLICITGEMM_COMMON and MIOPEN_DEBUG_FIND_ONLY_SOLVER
    When do we plan to either not run static kernel and how do we plan to make that choice?

Hi @TejashShah . Currently for gfx906 fp32 the dynamic kernel is within 10% performance drop compared with static kernel. currently the support range of dynamic kernel is not as good as static kernel, I'm trying to balance my work with 1) extend support range 2) performance, and 3) gfx908. Currently I'm not testing the config from resnext101/50, if you wish I can do it offline. And for the criteria to use only dynamic kernel, at current stage I think it's based on 2 factors 1) perf drop within 10%. 2) coverage, at least cover high priority model. I'm trying to make my best to make both happen for the up-comming gfx908 PR, but if not, at least make sure what I can cover can match the 1) perf drop within 10%.

I concur with you on the approach. For initial iteration reduction project, we focus on vega20 first, then gfx908. That's good this PR is targeted for vega20.

If I train any resnext/inception model end-to-end with this PR, would I see the compilation reduction, by default set of environment variables or do I need to set any specific env variables.?

TejashShah
TejashShah previously approved these changes Jul 22, 2020
@carlushuang
Copy link
Contributor Author

If I train any resnext/inception model end-to-end with this PR, would I see the compilation reduction, by default set of environment variables or do I need to set any specific env variables.?

There is a PR:#326 that enable hybrid mode of igemm. I believe E2E run with dynamic kernel can be achieved in that mode. Because currently, the noticeable 10% perf drop of dynamic kernel may lead to it unpicked if run normally.

@daniellowell daniellowell merged commit dce9c70 into develop Jul 27, 2020
@daniellowell daniellowell deleted the igemm_dynamic_v4r1_bwd branch November 16, 2020 02:33
namespace miopen {

template <typename T>
T gcd(T x, T y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚓ possibility of stack overflow, copy-pasta

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants