Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] xdlops NCHW support by transpose #1247

Merged
merged 47 commits into from
Nov 4, 2021

Conversation

carlushuang
Copy link
Contributor

@carlushuang carlushuang commented Oct 28, 2021

  • transpose hip kernel src/kernels/gpu_batched_transpose_kernel/batched_transpose.cpp
  • internal API to call transpose
  • ctest for batched transpose test/gpu_nchw_nhwc_transpose.cpp
  • add NHWC solver support NCHW by using transpose
  • test ssd

@atamazov @junliume this PR, especially the host part may need careful check. Maybe my implementation is not good and need big code refactoring, so please help review this.

tested on gfx908/gfx90a, in above link

carlushuang and others added 27 commits October 16, 2021 09:25
…16_ALT_IMPL to control MIOPEN_DEBUG_FP16_ALT_IMP attribute
… based on attribute MIOPEN_CONVOLUTION_ATTRIB_FP16_ALT_IMPL value
…nvAsmImplicitGemmGTCDynamicWrwXdlopsNHWC: Update Solver and Invokers.
…ng. MIOPEN_DEBUG_FP16_ALT_IMP -> MIOPEN_DEBUG_CONVOLUTION_ATTRIB_FP16_ALT_IMPL.
…tribute from all except ConvolutionAttribute::Set/Get().
@atamazov
Copy link
Contributor

[ci-slip] fix error in ocl backend

Nice typo)) skip


if(isGfx90aFp16altSupport)
{
result.construction_params.push_back(kernel);
std::ostringstream opts_1(options.str(), std::ios_base::ate);
GenerateClangDefsym(opts_1, "igemm_bwd_fp16_alt_impl", 1);
result.construction_params[1].comp_options = opts_1.str();
msg << ", fp16_alt:" << ctx.conv_problem.GetConv().attribute.gfx90aFp16alt.GetBwd();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The computations related to msg should happen only if msg is going to be printed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}

MIOPEN_LOG_I2("ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString());
MIOPEN_LOG_I2("ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString() + msg.str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use ostringstream syntax in MIOPEN_LOG* macros.

Suggested change
MIOPEN_LOG_I2("ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString() + msg.str());
MIOPEN_LOG_I2("ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " << config.ToString() << msg.str());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging output begins with [GetSolution], which creates an impression that logging happens during GetSolution() call. This is not so, however. The logging happens during invocation of kernels. Let's clearly indicate this:

Suggested change
MIOPEN_LOG_I2("ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString() + msg.str());
MIOPEN_LOG_I2("[INVOKER] ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString() + msg.str());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please always use SolverDbId():

Suggested change
MIOPEN_LOG_I2("ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString() + msg.str());
MIOPEN_LOG_I2(SolverDbId(*this) << ": " ...);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging output begins with [GetSolution], which creates an impression that logging happens during GetSolution() call. This is not so, however. The logging happens during invocation of kernels. Let's clearly indicate this:

This IS in GetSolution right? not actually invoker the kernels

Copy link
Contributor

@atamazov atamazov Nov 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually invoker.

Invoker is a lambda produced (instantiated) by another lambda (InvokerFactory). Both can be defined directly in GetSolution (like in this case) or somewhere else (only to avoid duplication of code).

InvokerFactory instance resides in the Solution object. The library runs it only when necessary, i.e. sometime later, after GetSolution(). The resulting Invoker object is instantiated and stored in the Invoker Cache.

Sometime more later, when the library wants to start the kernels, it will run Invoker.

}

MIOPEN_LOG_I2("ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString());
MIOPEN_LOG_I2("ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString() + msg.str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging output begins with [GetSolution], which creates an impression that logging happens during GetSolution() call. This is not so, however. The logging happens during invocation of kernels. Let's clearly indicate this:

Suggested change
MIOPEN_LOG_I2("ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString() + msg.str());
MIOPEN_LOG_I2("[INVOKER] ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString() + msg.str());

}

MIOPEN_LOG_I2("ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString());
MIOPEN_LOG_I2("ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString() + msg.str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please always use SolverDbId():

Suggested change
MIOPEN_LOG_I2("ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: " + config.ToString() + msg.str());
MIOPEN_LOG_I2(SolverDbId(*this) << ": " ...);

Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review

src/conv/invokers/impl_gemm_dynamic.cpp Outdated Show resolved Hide resolved
src/include/miopen/datatype.hpp Outdated Show resolved Hide resolved
src/hip/batched_transpose_sol.cpp Outdated Show resolved Hide resolved
}

template <typename T>
static inline float get_normalized_radio(T x, T y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Q] what is radio?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a radio of for example, width/height, or height/width. I use this radio in side kernel selection logic for some special case (although not optimal, which is hard to achieve)

src/hip/batched_transpose_sol.cpp Outdated Show resolved Hide resolved
#ifndef MIOPEN_UTIL_SOL_HPP_
#define MIOPEN_UTIL_SOL_HPP_

#include <miopen/batched_transpose_sol.hpp>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the "sol" abbreviation stands for? It would be better if all Solutions in MIOpen comply the Solver/Solution/Invoker arch (#866).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sol is for solution. And I use this name because this is different from that in util.hpp, where there exist some method that use a single function call to launch kernel. At lease this transpose implementation, is not a single function call, but return a kernel object and let the caller to launch


namespace miopen {

struct TransposeSolution_NCHW2NHWC : public BatchedTransposeSolution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need these wrappers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CamelCase

Suggested change
struct TransposeSolution_NCHW2NHWC : public BatchedTransposeSolution
struct TransposeSolutionNchw2Nhwc : public BatchedTransposeSolution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because NCHW->NHWC, and NHWC->NCHW is exactly the same problem of transpose, only by switching the height/width.
By generalizing this problem in to a batched-transpose problem, we can simplify a lot of (maybe) duplicated code, and this transpose can further be extended.
This wrappers are just to make the transpose easier

src/include/miopen/util_sol.hpp Outdated Show resolved Hide resolved
@@ -30,6 +30,7 @@
#include <miopen/gcn_asm_utils.hpp>
#include <miopen/solver/implicitgemm_util.hpp>
#include <miopen/conv/asm_implicit_gemm.hpp>
#include <miopen/util_sol.hpp>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This header name looks strange here. If it provides interface to transpose functionality, then it should be expressed in the header name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't have good idea about the naming. I just want to distinguish that in util.hpp

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ask sol" 🤣

src/solver/conv_asm_implicit_gemm_gtc_bwd_nhwc.cpp Outdated Show resolved Hide resolved
@junliume junliume dismissed atamazov’s stale review November 4, 2021 16:40

Changes made as requested, requesting re-review :)

@junliume junliume added this to the ROCm 5.0 milestone Nov 4, 2021
Copy link
Contributor

@junliume junliume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI has passed, review suggestions are applied.

@junliume junliume changed the title xdlops NCHW support by transpose [Enhancement] xdlops NCHW support by transpose Nov 4, 2021
@junliume junliume merged commit f02d959 into develop Nov 4, 2021
@atamazov
Copy link
Contributor

atamazov commented Nov 4, 2021

@junliume That was partial review.

@junliume
Copy link
Contributor

junliume commented Nov 5, 2021

@junliume That was partial review.

Let's use post merge issues :) Sorry for pressing hard on these blocker PRs.

@carlushuang carlushuang deleted the nchw_xdlops_support_by_transpose branch December 6, 2021 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants