Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Enhancement] xdlops NCHW support by transpose (#1247)
* implement set/get attribute API, and add MIOPEN_CONVOLUTION_ATTRIB_FP16_ALT_IMPL to control MIOPEN_DEBUG_FP16_ALT_IMP attribute * get attribute in asm igemm nhwc solver, and conditionally set symbol based on attribute MIOPEN_CONVOLUTION_ATTRIB_FP16_ALT_IMPL value * gfx90a_fp16_alt_impl(01) Add constness to the API. Allow resetting the attribute. Some error handling. Comments. * gfx90a_fp16_alt_impl(02) WrW: Pass ALT attribute via InvokeParams. ConvAsmImplicitGemmGTCDynamicWrwXdlopsNHWC: Update Solver and Invokers. * gfx90a_fp16_alt_impl(03) Accelerate access to attribute. Error handling. MIOPEN_DEBUG_FP16_ALT_IMP -> MIOPEN_DEBUG_CONVOLUTION_ATTRIB_FP16_ALT_IMPL. * gfx90a_fp16_alt_impl(10) [clang-tidy] Disable altera-unroll-loops (ROCm 4.5). Sort list of disabled warnings. * fix ostringstream constructor with string problem, by adding eta to 2nd arg * add batched transpose gpu kernel aim to serve nchw<->nhwc convert Co-authored-by: Artem Tamazov <[email protected]>
- Loading branch information