Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] xdlops NCHW support by transpose #1247

Merged
merged 47 commits into from
Nov 4, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
4136bbb
implement set/get attribute API, and add MIOPEN_CONVOLUTION_ATTRIB_FP…
carlushuang Oct 16, 2021
e1d563e
Merge remote-tracking branch 'origin/develop' into gfx90a_fp16_alt_impl
carlushuang Oct 16, 2021
0210e86
get attribute in asm igemm nhwc solver, and condintionally set symbol…
carlushuang Oct 17, 2021
533f954
Merge branch 'develop' into gfx90a_fp16_alt_impl
atamazov Oct 22, 2021
0ba350c
gfx90a_fp16_alt_impl(01) Add constness to the API. Allow resetting th…
atamazov Oct 22, 2021
1020032
gfx90a_fp16_alt_impl(02) WrW: Pass ALT attribute via InvokeParams. Co…
atamazov Oct 22, 2021
35c81c0
gfx90a_fp16_alt_impl(03) Accelerate access to attribute. Error handli…
atamazov Oct 23, 2021
65ee1f6
gfx90a_fp16_alt_impl(04) [Quality] Avoid duplication of code. Inline …
atamazov Oct 24, 2021
c743d63
gfx90a_fp16_alt_impl(05) [fin] Fix build error
atamazov Oct 24, 2021
1846b1a
gfx90a_fp16_alt_impl(06) Fix typo. [Quality] Hide direct access to at…
atamazov Oct 24, 2021
a50e3b1
gfx90a_fp16_alt_impl(07) [fin] Fix build error
atamazov Oct 24, 2021
5bb733f
gfx90a_fp16_alt_impl(08) [TEMP][CI] Disable all but static checks
atamazov Oct 24, 2021
7649295
gfx90a_fp16_alt_impl(09) Remove useless initializers. Fwd/Bwd: Pass A…
atamazov Oct 24, 2021
4043347
gfx90a_fp16_alt_impl(10) [clang-tidy] Disable altera-unroll-loops (RO…
atamazov Oct 24, 2021
65aa044
gfx90a_fp16_alt_impl(11) [clang-tidy] Fix some warnings for ROCm 4.5.
atamazov Oct 24, 2021
a32035a
gfx90a_fp16_alt_impl(12) Disable tidy checks at couple of lines for t…
atamazov Oct 24, 2021
3f57ccf
gfx90a_fp16_alt_impl(13) Less clarity, but no more cppcheck or tidy i…
atamazov Oct 24, 2021
7abd5aa
gfx90a_fp16_alt_impl(14) ConvAsmImplicitGemmGTCDynamicWrwXdlopsNHWC: …
atamazov Oct 25, 2021
f57f66e
gfx90a_fp16_alt_impl(15) ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC: …
atamazov Oct 25, 2021
ba44525
gfx90a_fp16_alt_impl(16) ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: …
atamazov Oct 25, 2021
7d4bbf4
Revert "gfx90a_fp16_alt_impl(08) [TEMP][CI] Disable all but static ch…
atamazov Oct 25, 2021
a529457
fix ostringstream constructor with string problem, by adding eta to 2…
carlushuang Oct 26, 2021
a376136
[ci-skip][Quality] openmode is a member of ios_base, not ostringstream.
atamazov Oct 26, 2021
494fdc4
add batched transpose gpu kernel aim to serve nchw<->nhwc convert
carlushuang Oct 28, 2021
fd8c492
Merge remote-tracking branch 'origin/develop' into nchw_xdlops_suppor…
carlushuang Oct 28, 2021
85f910c
[ci-skip] dos2unix
carlushuang Oct 28, 2021
dfca312
[ci-slip] fix error in ocl backend
carlushuang Oct 28, 2021
accb684
add NCHW support in asm NHWC solver
carlushuang Oct 29, 2021
1dd5e52
[ci-skip] refactor invoker for transpose kernel launch
carlushuang Oct 31, 2021
8b0d91b
Merge remote-tracking branch 'origin/develop' into nchw_xdlops_suppor…
carlushuang Oct 31, 2021
df58a1b
[ci-skip] dump some message in GetSolution
carlushuang Oct 31, 2021
9b6f615
[ci-skip] fix wrw transpose karg missing
carlushuang Nov 1, 2021
f0004f8
[ci-skip] remove unused desc
carlushuang Nov 1, 2021
50317f1
[ci-skip] fix a bug in bwd workspace size calculation
carlushuang Nov 1, 2021
4bd7f8e
Merge remote-tracking branch 'origin/develop' into nchw_xdlops_suppor…
carlushuang Nov 2, 2021
197d5f7
[ci-skip] 1. fix cassert 2. not using const_cast in CreateSubBuffer 3…
carlushuang Nov 2, 2021
075ae07
[ci-skip] not reference to a member of struct, and then captured in i…
carlushuang Nov 2, 2021
36954ce
[ci-skip] optimize transpose kernel selection, for large h/w(or w/h) …
carlushuang Nov 2, 2021
4c3abc7
[ci-skip] optimize transpose kernel launch parameters
carlushuang Nov 3, 2021
d3d234e
Merge remote-tracking branch 'origin/develop' into nchw_xdlops_suppor…
carlushuang Nov 3, 2021
3fdffa8
fix msg print
carlushuang Nov 3, 2021
0a9e9f4
fix hip-tidy about pointer reinterpret_cast
carlushuang Nov 3, 2021
5c76906
fix several review comments
carlushuang Nov 3, 2021
fa5be89
fix tidy
carlushuang Nov 4, 2021
1edf441
suppress missing invalidPointerCast
carlushuang Nov 4, 2021
6cf5f20
optimize transpose for conv 1st layer C=3 case
carlushuang Nov 4, 2021
2c348f2
suppress clang warning while compiling kernel
carlushuang Nov 4, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,9 @@ set( MIOpen_Source
include/miopen/reduce_common.hpp
include/miopen/sequences.hpp
include/miopen/rocm_features.hpp
include/miopen/batched_transpose_sol.hpp
include/miopen/magic_div.hpp
include/miopen/util_sol.hpp
md_graph.cpp
mdg_expr.cpp
conv/invokers/gcn_asm_1x1u.cpp
Expand Down Expand Up @@ -279,6 +282,7 @@ if( MIOPEN_BACKEND MATCHES "OpenCL" OR MIOPEN_BACKEND STREQUAL "HIPOC" OR MIOPEN
file(GLOB_RECURSE COMPOSABLE_KERNEL_DYNAMIC_CPP_SOURCE "kernels/dynamic_igemm/*.cpp")
file(GLOB_RECURSE GPU_REFERENCE_KERNEL_HIP "kernels/gpu_reference_kernel/*.cpp")
file(GLOB_RECURSE GPU_REFERENCE_KERNEL_ASM "kernels/gpu_reference_kernel/*.s")
file(GLOB_RECURSE GPU_BATCHED_TRANSPOSE_KERNEL_HIP "kernels/gpu_batched_transpose_kernel/*.cpp")

set(MIOPEN_KERNEL_INCLUDES
${STATIC_COMPOSABLE_KERNEL_INCLUDE}
Expand Down Expand Up @@ -379,6 +383,7 @@ if( MIOPEN_BACKEND MATCHES "OpenCL" OR MIOPEN_BACKEND STREQUAL "HIPOC" OR MIOPEN
${COMPOSABLE_KERNEL_DYNAMIC_CPP_SOURCE}
${GPU_REFERENCE_KERNEL_HIP}
${GPU_REFERENCE_KERNEL_ASM}
${GPU_BATCHED_TRANSPOSE_KERNEL_HIP}
kernels/detect_llvm_amdgcn_buffer_atomic_fadd_f32_float.cpp
kernels/MIOpenCheckNumerics.cl
kernels/MIOpenBatchNormActivBwdPerAct.cl
Expand Down Expand Up @@ -503,6 +508,7 @@ if( MIOPEN_BACKEND MATCHES "OpenCL" OR MIOPEN_BACKEND STREQUAL "HIPOC" OR MIOPEN
ocl/gcn_asm_utils.cpp
ocl/rnn_util_ocl.cpp
hip/hip_build_utils.cpp
hip/batched_transpose_sol.cpp
pooling.cpp
ocl/fusionopconvocl.cpp
ocl/fusionopbiasbnactivocl.cpp
Expand Down
367 changes: 300 additions & 67 deletions src/conv/invokers/impl_gemm_dynamic.cpp

Large diffs are not rendered by default.

Loading