[igemm_dynamic] v4r1 bwd dynamic kernel #272

carlushuang · 2020-06-09T01:46:52Z

v4r1 backward_data dynamic kernel.
This is to enable dynamic kernel in bwd direction, with v4r1 backward_data stratagy(gridwise_gemm). Although tensor contraction seems a MUST in dynamic kernels(due to its reduce in index calculation), let me re-add tensor-contraction version in future PR.

… kernel args

…mic.cpp

src/conv/invokers/impl_gemm_dynamic.cpp

DrizztDoUrden

LGTM

TejashShah

@carlushuang It looks like the code is approved for merge.

I have a few other questions related to perf

Do you have perf comparison of resnext101/50 configs between bwd v4r1 dynamic and bwd v4r1
static?
What is current applicability criteria for bwd v4r1 dynamic vs v4r1 static? It looks like you are testing your kernel in CI with help of DYNAMIC_IMPLICITGEMM_COMMON and MIOPEN_DEBUG_FIND_ONLY_SOLVER
When do we plan to either not run static kernel and how do we plan to make that choice?

carlushuang · 2020-07-22T02:48:24Z

@carlushuang It looks like the code is approved for merge.

I have a few other questions related to perf

Do you have perf comparison of resnext101/50 configs between bwd v4r1 dynamic and bwd v4r1
static?

What is current applicability criteria for bwd v4r1 dynamic vs v4r1 static? It looks like you are testing your kernel in CI with help of DYNAMIC_IMPLICITGEMM_COMMON and MIOPEN_DEBUG_FIND_ONLY_SOLVER
When do we plan to either not run static kernel and how do we plan to make that choice?

Hi @TejashShah . Currently for gfx906 fp32 the dynamic kernel is within 10% performance drop compared with static kernel. currently the support range of dynamic kernel is not as good as static kernel, I'm trying to balance my work with 1) extend support range 2) performance, and 3) gfx908. Currently I'm not testing the config from resnext101/50, if you wish I can do it offline. And for the criteria to use only dynamic kernel, at current stage I think it's based on 2 factors 1) perf drop within 10%. 2) coverage, at least cover high priority model. I'm trying to make my best to make both happen for the up-comming gfx908 PR, but if not, at least make sure what I can cover can match the 1) perf drop within 10%.

…_bwd

daniellowell · 2020-07-22T15:50:18Z

@zjing14 Can you review this PR please.

TejashShah · 2020-07-22T16:37:50Z

@carlushuang It looks like the code is approved for merge.
I have a few other questions related to perf

Do you have perf comparison of resnext101/50 configs between bwd v4r1 dynamic and bwd v4r1
static?

What is current applicability criteria for bwd v4r1 dynamic vs v4r1 static? It looks like you are testing your kernel in CI with help of DYNAMIC_IMPLICITGEMM_COMMON and MIOPEN_DEBUG_FIND_ONLY_SOLVER
When do we plan to either not run static kernel and how do we plan to make that choice?

Hi @TejashShah . Currently for gfx906 fp32 the dynamic kernel is within 10% performance drop compared with static kernel. currently the support range of dynamic kernel is not as good as static kernel, I'm trying to balance my work with 1) extend support range 2) performance, and 3) gfx908. Currently I'm not testing the config from resnext101/50, if you wish I can do it offline. And for the criteria to use only dynamic kernel, at current stage I think it's based on 2 factors 1) perf drop within 10%. 2) coverage, at least cover high priority model. I'm trying to make my best to make both happen for the up-comming gfx908 PR, but if not, at least make sure what I can cover can match the 1) perf drop within 10%.

I concur with you on the approach. For initial iteration reduction project, we focus on vega20 first, then gfx908. That's good this PR is targeted for vega20.

If I train any resnext/inception model end-to-end with this PR, would I see the compilation reduction, by default set of environment variables or do I need to set any specific env variables.?

carlushuang · 2020-07-23T00:43:27Z

If I train any resnext/inception model end-to-end with this PR, would I see the compilation reduction, by default set of environment variables or do I need to set any specific env variables.?

There is a PR:#326 that enable hybrid mode of igemm. I believe E2E run with dynamic kernel can be achieved in that mode. Because currently, the noticeable 10% perf drop of dynamic kernel may lead to it unpicked if run normally.

…_bwd

atamazov · 2021-04-24T20:30:14Z

src/include/miopen/numeric.hpp

+namespace miopen {
+
+template <typename T>
+T gcd(T x, T y)


⚓ possibility of stack overflow, copy-pasta

carlushuang and others added 30 commits April 19, 2020 15:10

[dynamic-kernel] add v4r1 generic dynamic kernel and solver, fwd fp32

0d91a23

fix tidy

dcf5378

update tunable table

2a93e4a

fix tidy for -abseil-string-find-startswith

6a7c236

fix tidy for readability-simplify-boolean-expr

34d52a7

add code of v4r1 dynamic fwd kc1x1 case

2964ab0

runnable code for v4r1 igemm 1x1 asm kernel case

dba8e15

modify igemm dynamic kernel call func: if kc1x1 kernel, remove the xy…

90d78c2

… kernel args

change format

c162f5c

fix clang-tidy warning:redundant boolean literal in implicitgemm_dyna…

ff923fc

…mic.cpp

format solver code file

0137fa9

Merge remote-tracking branch 'origin/develop' into igemm_dynamic

ad24941

Merge remote-tracking branch 'origin/igemm_dynamic' into igemm_dynamic

674b24b

Merge remote-tracking branch 'origin/develop' into igemm_dynamic

c78832a

add test_conv_for_dynamic_implicit_gemm to test dynamic kernel feature

22f0d68

Merge branch 'develop' into igemm_dynamic

802db9f

Merge remote-tracking branch 'origin/develop' into igemm_dynamic

0fb4ce5

refactor due to invoker

99a076e

fix tidy/cppcheck

a74b68a

register invoker for igemm_dynamic solver

5bb3cca

tidy print

9d47170

Merge remote-tracking branch 'origin/develop' into igemm_dynamic

91d7f0e

fix hip-clang bug to run assembly kernel

2b6b73b

Merge remote-tracking branch 'origin/develop' into igemm_dynamic

a5fac3c

fix invoker and misc for review

4201711

Merge remote-tracking branch 'origin/develop' into igemm_dynamic

ba2394e

put asm file in folder kernels/dynamic_igemm

b05ee9a

Merge remote-tracking branch 'origin/develop' into igemm_dynamic

a8857a8

Merge remote-tracking branch 'origin/develop' into igemm_dynamic

521faa1

Merge remote-tracking branch 'origin/develop' into igemm_dynamic

3191d5c

DrizztDoUrden requested changes Jul 21, 2020

View reviewed changes

src/conv/invokers/impl_gemm_dynamic.cpp Outdated Show resolved Hide resolved

carlushuang requested a review from DrizztDoUrden July 21, 2020 15:44

carlushuang added 2 commits July 21, 2020 15:47

use conv_problem as invoker param, instead of conv ctx

5368a56

remove kernel name check in invoker

5096c06

DrizztDoUrden previously approved these changes Jul 21, 2020

View reviewed changes

TejashShah reviewed Jul 21, 2020

View reviewed changes

carlushuang dismissed DrizztDoUrden’s stale review via fa44e70 July 22, 2020 02:27

fix a bug when re-factoring fwd invoker

fa44e70

Merge remote-tracking branch 'origin/develop' into igemm_dynamic_v4r1…

0c02219

…_bwd

shaojiewang mentioned this pull request Jul 22, 2020

dynamic igemm wrw #317

Merged

carlushuang mentioned this pull request Jul 22, 2020

Dynamic Implicit GEMM should not have Performance Configs or RunAndMeasure methods. #332

Closed

TejashShah previously approved these changes Jul 22, 2020

View reviewed changes

daniellowell requested review from atamazov, TejashShah and DrizztDoUrden July 22, 2020 21:24

daniellowell self-requested a review July 24, 2020 16:43

carlushuang dismissed TejashShah’s stale review via bcc1bfe July 26, 2020 13:13

Merge remote-tracking branch 'origin/develop' into igemm_dynamic_v4r1…

bcc1bfe

…_bwd

daniellowell mentioned this pull request Jul 27, 2020

Fast Hybrid mode of iGEMM #326

Merged

TejashShah approved these changes Jul 27, 2020

View reviewed changes

daniellowell merged commit dce9c70 into develop Jul 27, 2020

daniellowell deleted the igemm_dynamic_v4r1_bwd branch November 16, 2020 02:33

atamazov reviewed Apr 24, 2021

View reviewed changes

atamazov mentioned this pull request May 9, 2021

[iGemm][test_conv2d][gfx908][half] Verification failed #917

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[igemm_dynamic] v4r1 bwd dynamic kernel #272

[igemm_dynamic] v4r1 bwd dynamic kernel #272

carlushuang commented Jun 9, 2020 •

edited

Loading

DrizztDoUrden left a comment

TejashShah left a comment •

edited

Loading

carlushuang commented Jul 22, 2020

daniellowell commented Jul 22, 2020

TejashShah commented Jul 22, 2020 •

edited

Loading

carlushuang commented Jul 23, 2020

atamazov Apr 24, 2021

[igemm_dynamic] v4r1 bwd dynamic kernel #272

[igemm_dynamic] v4r1 bwd dynamic kernel #272

Conversation

carlushuang commented Jun 9, 2020 • edited Loading

DrizztDoUrden left a comment

Choose a reason for hiding this comment

TejashShah left a comment • edited Loading

Choose a reason for hiding this comment

carlushuang commented Jul 22, 2020

daniellowell commented Jul 22, 2020

TejashShah commented Jul 22, 2020 • edited Loading

carlushuang commented Jul 23, 2020

atamazov Apr 24, 2021

Choose a reason for hiding this comment

carlushuang commented Jun 9, 2020 •

edited

Loading

TejashShah left a comment •

edited

Loading

TejashShah commented Jul 22, 2020 •

edited

Loading