Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[igemm_dynamic] v4r1 bwd dynamic kernel #272

Merged
merged 54 commits into from
Jul 27, 2020
Merged
Show file tree
Hide file tree
Changes from 49 commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
0d91a23
[dynamic-kernel] add v4r1 generic dynamic kernel and solver, fwd fp32
carlushuang Apr 19, 2020
dcf5378
fix tidy
carlushuang Apr 20, 2020
2a93e4a
update tunable table
carlushuang Apr 20, 2020
6a7c236
fix tidy for -abseil-string-find-startswith
carlushuang Apr 20, 2020
34d52a7
fix tidy for readability-simplify-boolean-expr
carlushuang Apr 20, 2020
2964ab0
add code of v4r1 dynamic fwd kc1x1 case
Apr 22, 2020
dba8e15
runnable code for v4r1 igemm 1x1 asm kernel case
Apr 22, 2020
90d78c2
modify igemm dynamic kernel call func: if kc1x1 kernel, remove the xy…
shaojiewang Apr 22, 2020
c162f5c
change format
shaojiewang Apr 22, 2020
ff923fc
fix clang-tidy warning:redundant boolean literal in implicitgemm_dyna…
shaojiewang Apr 23, 2020
0137fa9
format solver code file
shaojiewang Apr 23, 2020
ad24941
Merge remote-tracking branch 'origin/develop' into igemm_dynamic
carlushuang Apr 28, 2020
674b24b
Merge remote-tracking branch 'origin/igemm_dynamic' into igemm_dynamic
carlushuang Apr 28, 2020
c78832a
Merge remote-tracking branch 'origin/develop' into igemm_dynamic
carlushuang Apr 29, 2020
22f0d68
add test_conv_for_dynamic_implicit_gemm to test dynamic kernel feature
carlushuang Apr 29, 2020
802db9f
Merge branch 'develop' into igemm_dynamic
May 1, 2020
0fb4ce5
Merge remote-tracking branch 'origin/develop' into igemm_dynamic
carlushuang May 6, 2020
99a076e
refactor due to invoker
carlushuang May 6, 2020
a74b68a
fix tidy/cppcheck
carlushuang May 6, 2020
5bb3cca
register invoker for igemm_dynamic solver
carlushuang May 6, 2020
9d47170
tidy print
carlushuang May 6, 2020
91d7f0e
Merge remote-tracking branch 'origin/develop' into igemm_dynamic
carlushuang May 7, 2020
2b6b73b
fix hip-clang bug to run assembly kernel
carlushuang May 9, 2020
a5fac3c
Merge remote-tracking branch 'origin/develop' into igemm_dynamic
carlushuang May 11, 2020
4201711
fix invoker and misc for review
carlushuang May 11, 2020
ba2394e
Merge remote-tracking branch 'origin/develop' into igemm_dynamic
carlushuang May 15, 2020
b05ee9a
put asm file in folder kernels/dynamic_igemm
carlushuang May 15, 2020
a8857a8
Merge remote-tracking branch 'origin/develop' into igemm_dynamic
carlushuang May 21, 2020
521faa1
Merge remote-tracking branch 'origin/develop' into igemm_dynamic
carlushuang May 24, 2020
3191d5c
Merge remote-tracking branch 'origin/develop' into igemm_dynamic
carlushuang Jun 3, 2020
4c49421
fix build error
carlushuang Jun 3, 2020
3289f0f
remove useless comment
carlushuang Jun 8, 2020
ee5ab4c
Merge remote-tracking branch 'origin/develop' into igemm_dynamic
carlushuang Jun 8, 2020
052a6bf
remove unused variable
carlushuang Jun 8, 2020
c94e521
add v4r1 bwd [skip ci]
carlushuang Jun 9, 2020
397f2ac
Merge remote-tracking branch 'origin/develop' into igemm_dynamic_v4r1…
carlushuang Jul 15, 2020
175f939
update kernel code
carlushuang Jul 15, 2020
9525acc
update kernel
carlushuang Jul 16, 2020
27e5e74
Merge remote-tracking branch 'origin/develop' into igemm_dynamic_v4r1…
carlushuang Jul 16, 2020
62290bb
clang-format
carlushuang Jul 16, 2020
85e31b2
remove useless code
carlushuang Jul 16, 2020
e46a669
fix tidy
carlushuang Jul 17, 2020
b8d3e4f
remove useless code and tidy, add more test
carlushuang Jul 17, 2020
2054f29
Merge remote-tracking branch 'origin/develop' into igemm_dynamic_v4r1…
carlushuang Jul 17, 2020
9b0939b
fix per review
carlushuang Jul 20, 2020
5aa5980
add missing header include
carlushuang Jul 20, 2020
a98ae54
add missing header
carlushuang Jul 20, 2020
92f8588
add kernel name in throw
carlushuang Jul 21, 2020
32b11f7
split invoker into seperate conv direction
carlushuang Jul 21, 2020
5368a56
use conv_problem as invoker param, instead of conv ctx
carlushuang Jul 21, 2020
5096c06
remove kernel name check in invoker
carlushuang Jul 21, 2020
fa44e70
fix a bug when re-factoring fwd invoker
carlushuang Jul 22, 2020
0c02219
Merge remote-tracking branch 'origin/develop' into igemm_dynamic_v4r1…
carlushuang Jul 22, 2020
bcc1bfe
Merge remote-tracking branch 'origin/develop' into igemm_dynamic_v4r1…
carlushuang Jul 26, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ set( MIOpen_Source
include/miopen/rnn_util.hpp
include/miopen/bz2.hpp
include/miopen/comgr.hpp
include/miopen/numeric.hpp
md_graph.cpp
mdg_expr.cpp
conv/invokers/gcn_asm_1x1u.cpp
Expand Down Expand Up @@ -223,6 +224,7 @@ set( MIOpen_Source
solver/conv_asm_implicit_gemm_v4r1_dynamic.cpp
solver/conv_hip_implicit_gemm_bwd_data_v4r1_xdlops.cpp
solver/conv_hip_implicit_gemm_v4r4_gen_xdlops_fwd_fp32.cpp
solver/conv_asm_implicit_gemm_bwd_v4r1_dynamic.cpp
)

list(APPEND MIOpen_Source tmp_dir.cpp binary_cache.cpp md5.cpp)
Expand Down
233 changes: 194 additions & 39 deletions src/conv/invokers/impl_gemm_dynamic.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,23 +3,24 @@
#include <miopen/algorithm.hpp>
#include <miopen/handle.hpp>
#include <miopen/tensor_ops.hpp>

#include <miopen/numeric.hpp>
#include <boost/any.hpp>

namespace miopen {
namespace conv {

float CallImplicitGemmDynamic(const miopen::Handle& handle,
const ConvolutionContext& ctx,
ConstData_t src,
Data_t dst,
ConstData_t wei,
const std::vector<KernelInvoke>& kernels)
float CallImplGemmDynamicForward(const miopen::Handle& handle,
const ConvolutionContext& ctx,
ConstData_t src,
Data_t dst,
ConstData_t wei,
const std::vector<KernelInvoke>& kernels)
{
float elapsed = 0.0f;

auto kernel = kernels[0];
MIOPEN_LOG_I(kernel.GetName());

bool kernel_is_1x1 = (kernel.GetName().find("igemm_v4r1_1x1_dynamic") == 0);
// clang-format off
int hi = ctx.in_height;
Expand Down Expand Up @@ -56,6 +57,7 @@ float CallImplicitGemmDynamic(const miopen::Handle& handle,
opArgs.emplace_back(dilation_w);
opArgs.emplace_back(pad_h);
opArgs.emplace_back(pad_w);
// clang-format off
if(kernel_is_1x1)
{
opArgs.emplace_back(__pack0);
Expand All @@ -73,45 +75,198 @@ float CallImplicitGemmDynamic(const miopen::Handle& handle,
return elapsed;
}

InvokerFactory MakeImplGemmDynamicDataInvokerFactory(const ConvolutionContext& ctx)

float CallImplGemmDynamicBackwardData(const miopen::Handle& handle,
const ConvolutionContext& ctx,
ConstData_t src,
Data_t dst,
ConstData_t wei,
const std::vector<KernelInvoke>& kernels)
{
if(ctx.direction.IsForward())
float elapsed = 0.0f;

auto kernel = kernels[0];
MIOPEN_LOG_I(kernel.GetName());

// clang-format off
int hi = ctx.out_height;
int wi = ctx.out_width;
int n = ctx.batch_sz;
int k = ctx.n_inputs;
int c = ctx.n_outputs;
int ho = ctx.in_height;
int wo = ctx.in_width;
int stride_h = ctx.in_height > 1 ? ctx.kernel_stride_h : 1;
int stride_w = ctx.in_width > 1 ? ctx.kernel_stride_w : 1;
int dilation_h = ctx.kernel_size_h > 1? ctx.kernel_dilation_h : 1;
int dilation_w = ctx.kernel_size_w > 1? ctx.kernel_dilation_w : 1;
int pad_h = ctx.pad_h;
int pad_w = ctx.pad_w;
int y = ctx.kernel_size_h;
int x = ctx.kernel_size_w;

int gcd_stride_dilation_h = gcd(stride_h, dilation_h);
int gcd_stride_dilation_w = gcd(stride_w, dilation_w);
int y_tilda = stride_h / gcd_stride_dilation_h;
int x_tilda = stride_w / gcd_stride_dilation_w;

int y_dot = (y + y_tilda - 1) / y_tilda;
int x_dot = (x + x_tilda - 1) / x_tilda;

int h_tilda = ho + (dilation_h * (y - 1) + stride_h - 1) / stride_h;
int w_tilda = wo + (dilation_w * (x - 1) + stride_w - 1) / stride_w;

int h_tilda_left = std::max(0, pad_h - dilation_h * (y_tilda - 1)) / stride_h;
int w_tilda_left = std::max(0, pad_w - dilation_w * (x_tilda - 1)) / stride_w;

int h_tilda_right = std::min(h_tilda, (pad_h + hi - 1 + stride_h - 1) / stride_h + 1);
int w_tilda_right = std::min(w_tilda, (pad_w + wi - 1 + stride_w - 1) / stride_w + 1);

int h_tilda_slice = h_tilda_right - h_tilda_left;
int w_tilda_slice = w_tilda_right - w_tilda_left;

int num_of_gemms = x_tilda * y_tilda;

int dtile_iy = 0;
int dtile_ix = 0;
int dtile_dy = dilation_h / gcd_stride_dilation_h;
int dtile_dx = dilation_w / gcd_stride_dilation_w;
int dtile_y = y_tilda;
int dtile_x = x_tilda;
int dtile_h = h_tilda;
int dtile_w = w_tilda;
int dslice_y = 0;
int dslice_x = 0;
int dslice_h = h_tilda_slice;
int dslice_w = w_tilda_slice;
int dslice_h_left = h_tilda_left;
int dslice_w_left = w_tilda_left;
int __pack0 = 0;
// clang-format on

std::vector<OpKernelArg> opArgs;
opArgs.emplace_back(dst);
opArgs.emplace_back(wei);
opArgs.emplace_back(src);
opArgs.emplace_back(hi);
opArgs.emplace_back(wi);
opArgs.emplace_back(n);
opArgs.emplace_back(k);
opArgs.emplace_back(c);
opArgs.emplace_back(ho);
opArgs.emplace_back(wo);
opArgs.emplace_back(stride_h);
opArgs.emplace_back(stride_w);
opArgs.emplace_back(dilation_h);
opArgs.emplace_back(dilation_w);
opArgs.emplace_back(pad_h);
opArgs.emplace_back(pad_w);
opArgs.emplace_back(y);
opArgs.emplace_back(x);
opArgs.emplace_back(dtile_iy);
opArgs.emplace_back(dtile_ix);
opArgs.emplace_back(dtile_dy);
opArgs.emplace_back(dtile_dx);
opArgs.emplace_back(dtile_y);
opArgs.emplace_back(dtile_x);
opArgs.emplace_back(dtile_h);
opArgs.emplace_back(dtile_w);
opArgs.emplace_back(dslice_y);
opArgs.emplace_back(dslice_x);
opArgs.emplace_back(dslice_h);
opArgs.emplace_back(dslice_w);
opArgs.emplace_back(dslice_h_left);
opArgs.emplace_back(dslice_w_left);
opArgs.emplace_back(__pack0);

for(int gemm_id = 0; gemm_id < num_of_gemms; gemm_id++)
{
return [ctx](const std::vector<Kernel>& kernels) {
return [=](const Handle& handle, const boost::any& primitive_parameters) {
const auto data_ctx = boost::any_cast<conv::DataInvokeParams>(primitive_parameters);
const auto& tensors = data_ctx.tensors;
auto kernel = handle.Run(kernels[0]);
if(kernel.GetName().find("igemm_v4r1_dynamic") == 0 ||
kernel.GetName().find("igemm_v4r1_1x1_dynamic") == 0)
int _dtile_iy = gemm_id / x_tilda;
int _dtile_ix = gemm_id % x_tilda;
int _y_dot_slice = (_dtile_iy + 1) * y_dot <= y ? y_dot : y % y_dot;
int _x_dot_slice = (_dtile_ix + 1) * x_dot <= x ? x_dot : x % x_dot;
int _gemm_k = k * _y_dot_slice * _x_dot_slice;
bool is_gemm_not_empty = _gemm_k > 0;
opArgs[18] = OpKernelArg(_dtile_iy);
opArgs[19] = OpKernelArg(_dtile_ix);
opArgs[26] = OpKernelArg(_y_dot_slice);
opArgs[27] = OpKernelArg(_x_dot_slice);
if(is_gemm_not_empty)
kernel(opArgs);
}

if(handle.IsProfilingEnabled())
elapsed += handle.GetKernelTime();
return elapsed;
}

InvokerFactory MakeImplGemmDynamicForwardInvokerFactory(const ConvolutionContext& ctx)
{
return [ctx](const std::vector<Kernel>& kernels) {
return [=](const Handle& handle, const boost::any& primitive_parameters) {
const auto data_ctx = boost::any_cast<conv::DataInvokeParams>(primitive_parameters);
const auto& tensors = data_ctx.tensors;
auto kernel = handle.Run(kernels[0]);
if(kernel.GetName().find("igemm_v4r1_dynamic") == 0 ||
carlushuang marked this conversation as resolved.
Show resolved Hide resolved
kernel.GetName().find("igemm_v4r1_1x1_dynamic") == 0)
{
std::vector<KernelInvoke> ks;
std::transform(kernels.begin(),
kernels.end(),
std::back_inserter(ks),
[&](const Kernel& k) { return handle.Run(k); });
float elapsed = 0;
elapsed =
CallImplGemmDynamicForward(handle, ctx, tensors.in, tensors.out, tensors.w, ks);
if(handle.IsProfilingEnabled())
{
std::vector<KernelInvoke> ks;
std::transform(kernels.begin(),
kernels.end(),
std::back_inserter(ks),
[&](const Kernel& k) { return handle.Run(k); });
float elapsed = 0;
elapsed = CallImplicitGemmDynamic(
handle, ctx, tensors.in, tensors.out, tensors.w, ks);
if(handle.IsProfilingEnabled())
{
handle.ResetKernelTime();
handle.AccumKernelTime(elapsed);
}
handle.ResetKernelTime();
handle.AccumKernelTime(elapsed);
}
else
}
else
{
MIOPEN_THROW(
"Error running dynamic implicit GEMM convolution (invalid kernel name " +
kernel.GetName() + ")");
}
};
};
}

InvokerFactory MakeImplGemmDynamicBackwardDataInvokerFactory(const ConvolutionContext& ctx)
{
return [ctx](const std::vector<Kernel>& kernels) {
return [=](const Handle& handle, const boost::any& primitive_parameters) {
const auto data_ctx = boost::any_cast<conv::DataInvokeParams>(primitive_parameters);
const auto& tensors = data_ctx.tensors;
auto kernel = handle.Run(kernels[0]);
if(kernel.GetName().find("igemm_bwd_gtc") == 0)
{
std::vector<KernelInvoke> ks;
std::transform(kernels.begin(),
kernels.end(),
std::back_inserter(ks),
[&](const Kernel& k) { return handle.Run(k); });
float elapsed = 0;

elapsed = CallImplGemmDynamicBackwardData(
handle, ctx, tensors.in, tensors.out, tensors.w, ks);

if(handle.IsProfilingEnabled())
{
MIOPEN_THROW(
"Error running dynamic implicit GEMM convolution (invalid kernel name?)");
handle.ResetKernelTime();
handle.AccumKernelTime(elapsed);
}
};
}
else
{
MIOPEN_THROW(
"Error running dynamic implicit GEMM convolution (invalid kernel name " +
kernel.GetName() + ")");
}
};
}
else
{
MIOPEN_THROW(
"Error running dynamic implicit GEMM convolution (currently only support forward)");
}
};
}

} // namespace conv
Expand Down
27 changes: 16 additions & 11 deletions src/include/miopen/conv/invokers/impl_gemm_dynamic.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -35,17 +35,22 @@
namespace miopen {
namespace conv {

// Beside used in invoker, currently this function is only called in RunAndMeasure() of dynamic
// igemm solver
// Remove this in the future when invoker is fully re-factored.
float CallImplicitGemmDynamic(const miopen::Handle& handle,
const ConvolutionContext& ctx,
ConstData_t src,
Data_t dst,
ConstData_t wei,
const std::vector<KernelInvoke>& kernels);

InvokerFactory MakeImplGemmDynamicDataInvokerFactory(const ConvolutionContext& ctx);
float CallImplGemmDynamicForward(const miopen::Handle& handle,
const ConvolutionContext& ctx,
ConstData_t src,
Data_t dst,
ConstData_t wei,
const std::vector<KernelInvoke>& kernels);

float CallImplGemmDynamicBackwardData(const miopen::Handle& handle,
const ConvolutionContext& ctx,
ConstData_t src,
Data_t dst,
ConstData_t wei,
const std::vector<KernelInvoke>& kernels);

InvokerFactory MakeImplGemmDynamicForwardInvokerFactory(const ConvolutionContext& ctx);
InvokerFactory MakeImplGemmDynamicBackwardDataInvokerFactory(const ConvolutionContext& ctx);

} // namespace conv
} // namespace miopen
64 changes: 64 additions & 0 deletions src/include/miopen/numeric.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
/*******************************************************************************
*
* MIT License
*
* Copyright (c) 2020 Advanced Micro Devices, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in all
* copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
*******************************************************************************/
#ifndef GUARD_MLOPEN_NUMERIC_HPP
#define GUARD_MLOPEN_NUMERIC_HPP

#include <numeric>

namespace miopen {

template <typename T>
T gcd(T x, T y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚓ possibility of stack overflow, copy-pasta

{
assert(!(x == 0 && y == 0));

if(x == y || x == 0)
{
return y;
}
else if(y == 0)
{
return x;
}
else if(x > y)
{
return gcd(x - y, y);
}
else
{
return gcd(x, y - x);
}
}

template <typename T, typename... Ys>
T gcd(T x, Ys... ys)
{
return gcd(x, gcd(ys...));
}

} // namespace miopen

#endif
6 changes: 6 additions & 0 deletions src/include/miopen/solver.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1361,6 +1361,12 @@ struct ConvAsmImplicitGemmV4R1DynamicFwd_1x1 : SolverBase<ConvolutionContext>
float& elapsed_time) const;
};

struct ConvAsmImplicitGemmV4R1DynamicBwd : SolverBase<ConvolutionContext>
{
bool IsApplicable(const ConvolutionContext&) const;
ConvSolution GetSolution(const ConvolutionContext&) const;
};

/// Holds common member functions for the Solvers which share the same
/// "legacy exhaustive search" machinery.
struct ConvOclDirectFwdLegacyExhaustiveSearch : SolverBase<ConvolutionContext>
Expand Down
Loading