-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] xdlops NCHW support by transpose #1247
Conversation
…16_ALT_IMPL to control MIOPEN_DEBUG_FP16_ALT_IMP attribute
… based on attribute MIOPEN_CONVOLUTION_ATTRIB_FP16_ALT_IMPL value
…e attribute. Some error handling. Comments.
…nvAsmImplicitGemmGTCDynamicWrwXdlopsNHWC: Update Solver and Invokers.
…ng. MIOPEN_DEBUG_FP16_ALT_IMP -> MIOPEN_DEBUG_CONVOLUTION_ATTRIB_FP16_ALT_IMPL.
…attribute accessors.
…tribute from all except ConvolutionAttribute::Set/Get().
…LT attribute via InvokeParams.
…Cm 4.5). Sort list of disabled warnings.
…he sake of clarity of the source code.
…Update Solver and Invoker.
…Update Solver and Invokers
…ecks" This reverts commit 5bb733f.
Nice typo)) skip |
|
||
if(isGfx90aFp16altSupport) | ||
{ | ||
result.construction_params.push_back(kernel); | ||
std::ostringstream opts_1(options.str(), std::ios_base::ate); | ||
GenerateClangDefsym(opts_1, "igemm_bwd_fp16_alt_impl", 1); | ||
result.construction_params[1].comp_options = opts_1.str(); | ||
msg << ", fp16_alt:" << ctx.conv_problem.GetConv().attribute.gfx90aFp16alt.GetBwd(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The computations related to msg
should happen only if msg
is going to be printed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
} | ||
|
||
MIOPEN_LOG_I2("ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString()); | ||
MIOPEN_LOG_I2("ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString() + msg.str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use ostringstream syntax in MIOPEN_LOG*
macros.
MIOPEN_LOG_I2("ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString() + msg.str()); | |
MIOPEN_LOG_I2("ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " << config.ToString() << msg.str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging output begins with [GetSolution]
, which creates an impression that logging happens during GetSolution()
call. This is not so, however. The logging happens during invocation of kernels. Let's clearly indicate this:
MIOPEN_LOG_I2("ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString() + msg.str()); | |
MIOPEN_LOG_I2("[INVOKER] ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString() + msg.str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please always use SolverDbId():
MIOPEN_LOG_I2("ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString() + msg.str()); | |
MIOPEN_LOG_I2(SolverDbId(*this) << ": " ...); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging output begins with
[GetSolution]
, which creates an impression that logging happens duringGetSolution()
call. This is not so, however. The logging happens during invocation of kernels. Let's clearly indicate this:
This IS in GetSolution right? not actually invoker the kernels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually invoker.
Invoker is a lambda produced (instantiated) by another lambda (InvokerFactory). Both can be defined directly in GetSolution (like in this case) or somewhere else (only to avoid duplication of code).
InvokerFactory instance resides in the Solution object. The library runs it only when necessary, i.e. sometime later, after GetSolution(). The resulting Invoker object is instantiated and stored in the Invoker Cache.
Sometime more later, when the library wants to start the kernels, it will run Invoker.
} | ||
|
||
MIOPEN_LOG_I2("ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString()); | ||
MIOPEN_LOG_I2("ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString() + msg.str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging output begins with [GetSolution]
, which creates an impression that logging happens during GetSolution()
call. This is not so, however. The logging happens during invocation of kernels. Let's clearly indicate this:
MIOPEN_LOG_I2("ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString() + msg.str()); | |
MIOPEN_LOG_I2("[INVOKER] ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString() + msg.str()); |
} | ||
|
||
MIOPEN_LOG_I2("ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString()); | ||
MIOPEN_LOG_I2("ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString() + msg.str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please always use SolverDbId():
MIOPEN_LOG_I2("ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString() + msg.str()); | |
MIOPEN_LOG_I2(SolverDbId(*this) << ": " ...); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review
src/hip/batched_transpose_sol.cpp
Outdated
} | ||
|
||
template <typename T> | ||
static inline float get_normalized_radio(T x, T y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Q] what is radio?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is a radio of for example, width/height, or height/width. I use this radio in side kernel selection logic for some special case (although not optimal, which is hard to achieve)
#ifndef MIOPEN_UTIL_SOL_HPP_ | ||
#define MIOPEN_UTIL_SOL_HPP_ | ||
|
||
#include <miopen/batched_transpose_sol.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the "sol" abbreviation stands for? It would be better if all Solutions in MIOpen comply the Solver/Solution/Invoker arch (#866).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sol is for solution. And I use this name because this is different from that in util.hpp
, where there exist some method that use a single function call to launch kernel. At lease this transpose implementation, is not a single function call, but return a kernel object and let the caller to launch
src/include/miopen/util_sol.hpp
Outdated
|
||
namespace miopen { | ||
|
||
struct TransposeSolution_NCHW2NHWC : public BatchedTransposeSolution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need these wrappers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CamelCase
struct TransposeSolution_NCHW2NHWC : public BatchedTransposeSolution | |
struct TransposeSolutionNchw2Nhwc : public BatchedTransposeSolution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because NCHW->NHWC, and NHWC->NCHW is exactly the same problem of transpose, only by switching the height/width.
By generalizing this problem in to a batched-transpose problem, we can simplify a lot of (maybe) duplicated code, and this transpose can further be extended.
This wrappers are just to make the transpose easier
@@ -30,6 +30,7 @@ | |||
#include <miopen/gcn_asm_utils.hpp> | |||
#include <miopen/solver/implicitgemm_util.hpp> | |||
#include <miopen/conv/asm_implicit_gemm.hpp> | |||
#include <miopen/util_sol.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This header name looks strange here. If it provides interface to transpose functionality, then it should be expressed in the header name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I don't have good idea about the naming. I just want to distinguish that in util.hpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"ask sol" 🤣
Changes made as requested, requesting re-review :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI has passed, review suggestions are applied.
@junliume That was partial review. |
Let's use post merge issues :) Sorry for pressing hard on these blocker PRs. |
src/kernels/gpu_batched_transpose_kernel/batched_transpose.cpp
test/gpu_nchw_nhwc_transpose.cpp
@atamazov @junliume this PR, especially the host part may need careful check. Maybe my implementation is not good and need big code refactoring, so please help review this.
tested on gfx908/gfx90a, in above link