Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API] Get/Set "ALT" ConvolutionAttribute. [Core] Pass attribute to Invokers. [FP16][NHWC][gfx90a][asm igemm] Build & run appropriate kernel. #1226

Merged
merged 23 commits into from
Oct 27, 2021

Conversation

carlushuang
Copy link
Contributor

@carlushuang carlushuang commented Oct 16, 2021

This is prerequisite for #1227. No functional changes, except new API calls (these calls do not change the behavior of the library for now).

  • MIOpen API: Get/Set ConvolutionAttribute in convolution descriptor.
  • Env var: MIOPEN_DEBUG_CONVOLUTION_ATTRIB_FP16_ALT_IMPL.
  • The attribute passed to Fwd/Bwd/Wrw/Fused InvokeParams.
  • In ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC, ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC,
    ConvAsmImplicitGemmGTCDynamicWrwXdlopsNHWC:
    • Additional kernels produced when necessary.
    • Invokers select appropriate kernel according to the new attribute
  • By-products:
    • Removed useless initializers in Fwd/Bwd/Wrw InvokeParams
    • [clang-tidy] Fixed some warnings for ROCm 4.5.
    • [clang-tidy] Disabled altera-unroll-loops (ROCm 4.5).
    • [clang-tidy] Sorted list of disabled warnings.

-- @atamazov

@atamazov
Copy link
Contributor

@carlushuang please see mail.

… based on attribute MIOPEN_CONVOLUTION_ATTRIB_FP16_ALT_IMPL value
@carlushuang
Copy link
Contributor Author

@atamazov @JehandadKhan this PR is workable, and with kernel from PR:#1227, can switch between fp16 alt impl and native fp16 impl.

@carlushuang carlushuang changed the title fp16 alt implementation in gfx90a Alt implementation of iGEMM kernel Oct 19, 2021
@carlushuang carlushuang changed the title Alt implementation of iGEMM kernel Implementation of Convolution Attributes Oct 19, 2021
@carlushuang carlushuang marked this pull request as ready for review October 21, 2021 02:18
@carlushuang
Copy link
Contributor Author

@atamazov this PR and #1227 are also needed for release 5.0. Please help review these PR and if anything you think need reorganize, maybe we need hurry up...

@codecov

This comment has been minimized.

@junliume
Copy link
Contributor

@atamazov this PR and #1227 are also needed for release 5.0. Please help review these PR and if anything you think need reorganize, maybe we need hurry up...

working on getting #1226 through CI, #1227 has passed CI, gentle ping @atamazov for review :)

@atamazov
Copy link
Contributor

atamazov commented Oct 22, 2021

I am working on this. Changes will most likely be required in #1227 (and I will take care of those)

@atamazov atamazov self-assigned this Oct 22, 2021
…nvAsmImplicitGemmGTCDynamicWrwXdlopsNHWC: Update Solver and Invokers.
@atamazov atamazov marked this pull request as draft October 22, 2021 23:59
@atamazov
Copy link
Contributor

@carlushuang Please look at changes in the WrW Solver, and ask questions about design, if any. Thanks.

src/convolution.cpp Outdated Show resolved Hide resolved
…ng. MIOPEN_DEBUG_FP16_ALT_IMP -> MIOPEN_DEBUG_CONVOLUTION_ATTRIB_FP16_ALT_IMPL.
@atamazov atamazov marked this pull request as ready for review October 25, 2021 18:16
@atamazov
Copy link
Contributor

@carlushuang Please also review this, thanks.

@atamazov atamazov changed the title Implementation of Convolution Attributes [API] Get/Set ConvolutionAttribute. [Core] Pass attribute to Invokers. [asm igemm][gfx90a][FP16] Build & run appropriate kernel. Oct 25, 2021
Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'm going to merge this myself, as soon as it passes the CI.

@junliume
Copy link
Contributor

LGTM. I'm going to merge this myself, as soon as it passes the CI.

The most recent commit is not yet picked up by CI, and whole set of test will take a long time.
Can we select gfx90a tests only?

@atamazov atamazov changed the title [API] Get/Set ConvolutionAttribute. [Core] Pass attribute to Invokers. [asm igemm][gfx90a][FP16] Build & run appropriate kernel. [API] Get/Set ConvolutionAttribute. [Core] Pass attribute to Invokers. [FP16][NHWC][gfx90a][asm igemm] Build & run appropriate kernel. Oct 26, 2021
@atamazov atamazov changed the title [API] Get/Set ConvolutionAttribute. [Core] Pass attribute to Invokers. [FP16][NHWC][gfx90a][asm igemm] Build & run appropriate kernel. [API] Get/Set "ALT" ConvolutionAttribute. [Core] Pass attribute to Invokers. [FP16][NHWC][gfx90a][asm igemm] Build & run appropriate kernel. Oct 26, 2021
@atamazov atamazov merged commit 3ddeaaa into develop Oct 27, 2021
Comment on lines 776 to +781
return [=](const Handle& handle, const AnyInvokeParams& primitive_parameters) mutable {
decltype(auto) wrw_invoke_params =
primitive_parameters.CastTo<conv::WrWInvokeParams>();
const auto& tensors = wrw_invoke_params.tensors;
const auto k = handle.Run(kernels[0]);
const auto& tensors = wrw_invoke_params.tensors;
const auto k = handle.Run(
kernels[(isGfx90aFp16altSupport && wrw_invoke_params.gfx90aFp16alt) ? 1 : 0]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atamazov pushed a commit that referenced this pull request Nov 7, 2021
* Follows the design of #1226
  - FP16 ALT kernels in GEMM obey the existing `MIOPEN_DEBUG_CONVOLUTION_ATTRIB_FP16_ALT_IMPL`.
  - [Informative] Also there is dedicated env control in rocBLAS, `ROCBLAS_INTERNAL_FP16_ALT_IMPL`.
* Short explanation:
  - GEMM Invokers read gfx90aFp16alt attribute from the instance of AnyInvokeParams passed to it.
  - This attribute is then forwarded down to calls of `rocblas_gemm_ex` and `rocblas_gemm_strided_batched_ex`.
  - rocBLAS will make the determination at runtime, and ignore the attribute when not executing on gfx90a
  - ⚠️ We expect that rocBLAS ignore the attribute for non-FP16 kernels.
* This PR have no effect for GEMM backends other than rocBLAS.
@carlushuang carlushuang deleted the gfx90a_fp16_alt_impl branch December 6, 2021 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants