Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fp16][MI100] ConvAsmImplicitGemmGTCDynamicWrwXdlops: "Running kernel failed: Invalid Kernel Args" #990

Open
atamazov opened this issue Jun 19, 2021 · 10 comments

Comments

@atamazov
Copy link
Contributor

atamazov commented Jun 19, 2021

It seems that ConvAsmImplicitGemmGTCDynamicWrwXdlops is not functional with FP16. This problem is camouflaged by non-Normal Find mode, so the only side effect is performance loss. An error message appears only once (unless find-db is removed).

Preconditions:

  • develop f75c7c7
  • Any ROCm version (tested with 3.7 and 4.2)
  • gfx908
  • HIP or OCL backend
  • MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvAsmImplicitGemmGTCDynamicWrwXdlops MIOPEN_FIND_MODE=normal

Symptom:

root@miopen908-1:/dockerx/github/miopenx01/build/37.release.opencl.nodev# ./bin/MIOpenDriver convfp16 -n 1 -c 256 -H 12 -W 12 -k 1 -y 3 -x 3 -p 0 -q 0 -u 2 -v 2 -i 1 -w 1 -t 1 -V 1 -F 4
MIOpenDriver convfp16 -n 1 -c 256 -H 12 -W 12 -k 1 -y 3 -x 3 -p 0 -q 0 -u 2 -v 2 -i 1 -w 1 -t 1 -V 1 -F 4
MIOpen(OpenCL): Info [get_device_name] Raw device name: gfx908
MIOpen(OpenCL): Info [Handle] stream: 0x564c4b264940, device_id: 0x564c4b35e930
MIOpen(OpenCL): Info [GetFindModeValueImpl] MIOPEN_FIND_MODE = DYNAMIC_HYBRID(5)
MIOpen(OpenCL): Info [BackwardWeightsGetWorkSpaceSize]
MIOpen(OpenCL): Info [GetWrwSolutions]
MIOpen(OpenCL): Info [Measure] Db::Prefetch time: 86.7012 ms
MIOpen(OpenCL): Info [AmdRocmMetadataVersionDetect] ROCm MD version AMDHSA_COv3, MIOpen version 2.12.0.8475-c8532c566
MIOpen(OpenCL): Info [HipCompilerVersionImpl] 3.7.20315
...
MIOpen(OpenCL): Error [EvaluateInvokers] /dockerx/github/miopenx01/src/ocl_kernel.cpp:89: Running kernel failed:  Invalid Kernel Args
...

What I see at log level 6:

MIOpen(OpenCL): Info2 [PrepareInvoker] Preparing kernel: igemm_wrw_gtcx_nchw_fp16_bx8_ex1_bt32x32x8_wt16x16x4_ws1x1_wr1x1_ta1x1x1x1_1x8x1x32_tb1x1x1x1_1x8x1x32
MIOpen(OpenCL): Error [EvaluateInvokers] /dockerx/github/miopenx01/src/ocl_kernel.cpp:89: Running kernel failed:  Invalid Kernel Args
Log of sample session with log level 6
root@miopen908-1:/dockerx/github/miopenx01/build/42-16.release.opencl# cmake -DBUILD_DEV=On -DCMAKE_BUILD_TYPE=Release -DMIOPEN_BACKEND=OpenCL -DMIOPEN_TEST_ALL=On -DMIOPEN_TEST_GFX908=On -DMIOPEN_TEST_HALF=On -DMIOPEN_TEST_FLAGS="--verbose --disable-verification-cache" -DCMAKE_PREFIX_PATH=/root/driver/MLOpen/deps_opencl -DCMAKE_INSTALL_PREFIX=../../install/release ../..
...
root@miopen908-1:/dockerx/github/miopenx01/build/42-16.release.opencl# make MIOpenDriver -j 24
...
root@miopen908-1:/dockerx/github/miopenx01/build/42-16.release.opencl# unset MIOPEN_FIND_MODE
root@miopen908-1:/dockerx/github/miopenx01/build/42-16.release.opencl# rm ../../src/kernels/gfx*ufdb*
root@miopen908-1:/dockerx/github/miopenx01/build/42-16.release.opencl# MIOPEN_LOG_LEVEL=6 ./bin/MIOpenDriver convfp16 -n 1 -c 256 -H 12 -W 12 -k 1 -y 3 -x 3 -p 0 -q 0 -u 2 -v 2 -i 1 -w 1 -t 1 -V 1 -F 4
MIOpenDriver convfp16 -n 1 -c 256 -H 12 -W 12 -k 1 -y 3 -x 3 -p 0 -q 0 -u 2 -v 2 -i 1 -w 1 -t 1 -V 1 -F 4
MIOpen(OpenCL): Info [get_device_name] Raw device name: gfx908:sramecc+:xnack-
MIOpen(OpenCL): Info [Handle] stream: 0x556bf5a10940, device_id: 0x556bf5bcf740
MIOpen(OpenCL): Info [GetFindModeValueImpl] MIOPEN_FIND_MODE = DYNAMIC_HYBRID(5)
MIOpen(OpenCL): Info [BackwardWeightsGetWorkSpaceSize]
MIOpen(OpenCL): Info [GetWrwSolutions]
MIOpen(OpenCL): Info [Measure] Db::Prefetch time: 86.3331 ms
MIOpen(OpenCL): Info2 [FindRecordUnsafe] Looking for key 1-5-5-3x3-256-12-12-1-0x0-2x2-1x1-0-NCHW-FP16-W in file /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.2_13_0.ufdb.txt
MIOpen(OpenCL): Info2 [FindRecordUnsafe] File is unreadable: /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.2_13_0.ufdb.txt
MIOpen(OpenCL): Info2 [FindRecord] Looking for key 1-5-5-3x3-256-12-12-1-0x0-2x2-1x1-0-NCHW-FP16-W in file /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.fdb.txt
MIOpen(OpenCL): Info2 [Measure] Db::FindRecord time: 0.105107 ms
MIOpen(OpenCL): Info [AmdRocmMetadataVersionDetect] ROCm MD version AMDHSA_COv3, MIOpen version 2.13.0.f75c7c7ab
MIOpen(OpenCL): Info2 [ValidateGcnAssemblerImpl] Running: '/opt/rocm/llvm/bin/clang --version'
MIOpen(OpenCL): Info2 [ValidateGcnAssemblerImpl] clang version 12.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-4.2.0 21155 fd12620eb54b192daf0231b6950421a85cda1feb)
MIOpen(OpenCL): Info2 [ValidateGcnAssemblerImpl] Target: x86_64-unknown-linux-gnu
MIOpen(OpenCL): Info2 [ValidateGcnAssemblerImpl] Thread model: posix
MIOpen(OpenCL): Info2 [ValidateGcnAssemblerImpl] InstalledDir: /opt/rocm/llvm/bin
MIOpen(OpenCL): Info2 [ValidateGcnAssemblerImpl]
MIOpen(OpenCL): Info2 [GetSolutionsFallback] ConvDirectNaiveConvWrw Estimated WTI = 0.01
MIOpen(OpenCL): Info2 [GetSolutionsFallback] ConvAsmImplicitGemmGTCDynamicWrwXdlops Estimated WTI = -2
MIOpen(OpenCL): Info2 [GetSolutionsFallback] maxSolutionCount = 1, available = 1
MIOpen(OpenCL): Info2 [GetSolutionsFallback] id: 87 algo: 1, time: 1000 ms, ws: 0, name: ConvDirectNaiveConvWrw
MIOpen(OpenCL): Info2 [HipCompilerVersionImpl] Read version information from HIP package...
MIOpen(OpenCL): Info [HipCompilerVersionImpl] 4.2.21155
MIOpen(OpenCL): Info [HeuristicInit] All attempts unsuccessful
MIOpen(OpenCL): Info [HeuristicInit] 256,256,8,128,128,8,0,1
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvHipImplicitGemmWrwV4R4Xdlops: Not applicable
MIOpen(OpenCL): Info [HeuristicInit] 16,64,8,16,16,4,16,64,16,0,1
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvHipImplicitGemmWrwV4R4Xdlops_Padded_Gemm: Skipped (non-dynamic)
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvHipImplicitGemmV4R1WrW: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvHipImplicitGemmV4R4WrW: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvAsmImplicitGemmV4R1DynamicWrw: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvHipImplicitGemmMlirCppWrW: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvMlirIgemmWrWXdlops: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvMlirIgemmWrW: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvAsmImplicitGemmGTCDynamicWrwXdlops: 9216
MIOpen(OpenCL): Info2 [BackwardWeightsGetWorkSpaceSizeImplicitGemm] 0 < 9216
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvBinWinogradRxS: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvBinWinogradRxSf2x3: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvBinWinogradRxSf2x3g1: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvWinograd3x3MultipassWrW<3-2>: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvWinograd3x3MultipassWrW<3-3>: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvWinograd3x3MultipassWrW<3-4>: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvWinograd3x3MultipassWrW<3-5>: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvWinograd3x3MultipassWrW<3-6>: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvWinograd3x3MultipassWrW<7-2>: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvWinograd3x3MultipassWrW<7-3>: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvWinograd3x3MultipassWrW<7-3-1-1>: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvWinograd3x3MultipassWrW<7-2-1-1>: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvWinograd3x3MultipassWrW<1-1-7-2>: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvWinograd3x3MultipassWrW<1-1-7-3>: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvWinograd3x3MultipassWrW<5-3>: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvWinograd3x3MultipassWrW<5-4>: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvAsmBwdWrW1x1: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvAsmBwdWrW3x3: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvOclBwdWrW2<1>: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvOclBwdWrW2<2>: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvOclBwdWrW2<4>: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvOclBwdWrW2<8>: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvOclBwdWrW2<16>: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvOclBwdWrW2NonTunable: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvOclBwdWrW53: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvOclBwdWrW1x1: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvDirectNaiveConvFwd: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvDirectNaiveConvBwd: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] ConvDirectNaiveConvWrw: 0
MIOpen(OpenCL): Info2 [GetWorkspaceSize] GemmFwd1x1_0_1: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] GemmFwd1x1_0_1_int8: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] GemmFwd1x1_0_2: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] GemmFwdRest: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] GemmBwd1x1_stride1: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] GemmBwd1x1_stride2: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] GemmBwdRest: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] GemmWrw1x1_stride1: Not applicable
MIOpen(OpenCL): Info2 [GetWorkspaceSize] GemmWrwUniversal: Not applicable
MIOpen(OpenCL): Info2 [BackwardWeightsGetWorkSpaceSize] 9216
MIOpen(OpenCL): Info [FindConvBwdWeightsAlgorithm] requestAlgoCount = 2, workspace = 9216
MIOpen(OpenCL): Info [GetWrwSolutions]
MIOpen(OpenCL): Info2 [FindRecordUnsafe] Looking for key 1-5-5-3x3-256-12-12-1-0x0-2x2-1x1-0-NCHW-FP16-W in file /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.2_13_0.ufdb.txt
MIOpen(OpenCL): Info2 [FindRecordUnsafe] File is unreadable: /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.2_13_0.ufdb.txt
MIOpen(OpenCL): Info2 [FindRecord] Looking for key 1-5-5-3x3-256-12-12-1-0x0-2x2-1x1-0-NCHW-FP16-W in file /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.fdb.txt
MIOpen(OpenCL): Info2 [Measure] Db::FindRecord time: 0.111461 ms
MIOpen(OpenCL): Info2 [GetSolutionsFallback] ConvDirectNaiveConvWrw Estimated WTI = 0.01
MIOpen(OpenCL): Info2 [GetSolutionsFallback] ConvAsmImplicitGemmGTCDynamicWrwXdlops Estimated WTI = -2
MIOpen(OpenCL): Info2 [GetSolutionsFallback] maxSolutionCount = 1, available = 1
MIOpen(OpenCL): Info2 [GetSolutionsFallback] id: 87 algo: 1, time: 1000 ms, ws: 0, name: ConvDirectNaiveConvWrw
MIOpen(OpenCL): Info2 [FindRecordUnsafe] Looking for key 1-5-5-3x3-256-12-12-1-0x0-2x2-1x1-0-NCHW-FP16-W in file /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.2_13_0.ufdb.txt
MIOpen(OpenCL): Info2 [FindRecordUnsafe] File is unreadable: /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.2_13_0.ufdb.txt
MIOpen(OpenCL): Info2 [Measure] Db::FindRecord time: 0.058486 ms
MIOpen(OpenCL): Info [TryLoad] Find-db regenerating.
MIOpen(OpenCL): Info2 [SQLiteBase] Initializing system database file /dockerx/github/miopenx01/src/kernels/miopen.db
MIOpen(OpenCL): Info2 [SQLiteBase] Initializing user database file /dockerx/github/miopenx01/src/kernels/miopen_1.0.0.udb
MIOpen(OpenCL): Info2 [SearchForAllSolutions] GemmFwd1x1_0_1: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] GemmFwd1x1_0_1_int8: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] GemmFwd1x1_0_2: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] GemmFwdRest: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] GemmBwd1x1_stride1: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] GemmBwd1x1_stride2: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] GemmBwdRest: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] GemmWrw1x1_stride1: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] GemmWrwUniversal: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvAsmBwdWrW1x1: Skipped (non-dynamic)
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvAsmBwdWrW3x3: Skipped (non-dynamic)
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvOclBwdWrW2<1>: Skipped (non-dynamic)
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvOclBwdWrW2<2>: Skipped (non-dynamic)
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvOclBwdWrW2<4>: Skipped (non-dynamic)
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvOclBwdWrW2<8>: Skipped (non-dynamic)
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvOclBwdWrW2<16>: Skipped (non-dynamic)
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvOclBwdWrW2NonTunable: Skipped (non-dynamic)
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvOclBwdWrW53: Skipped (non-dynamic)
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvOclBwdWrW1x1: Skipped (non-dynamic)
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvDirectNaiveConvFwd: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvDirectNaiveConvBwd: Not applicable
MIOpen(OpenCL): Info [FindSolutionImpl] ConvDirectNaiveConvWrw (not searchable)
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvDirectNaiveConvWrw: Success.
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvBinWinogradRxS: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvBinWinogradRxSf2x3: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvBinWinogradRxSf2x3g1: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvWinograd3x3MultipassWrW<3-2>: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvWinograd3x3MultipassWrW<3-3>: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvWinograd3x3MultipassWrW<3-4>: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvWinograd3x3MultipassWrW<3-5>: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvWinograd3x3MultipassWrW<3-6>: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvWinograd3x3MultipassWrW<7-2>: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvWinograd3x3MultipassWrW<7-3>: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvWinograd3x3MultipassWrW<7-3-1-1>: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvWinograd3x3MultipassWrW<7-2-1-1>: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvWinograd3x3MultipassWrW<1-1-7-2>: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvWinograd3x3MultipassWrW<1-1-7-3>: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvWinograd3x3MultipassWrW<5-3>: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvWinograd3x3MultipassWrW<5-4>: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvHipImplicitGemmWrwV4R4Xdlops: Skipped (non-dynamic)
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvHipImplicitGemmWrwV4R4Xdlops_Padded_Gemm: Skipped (non-dynamic)
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvHipImplicitGemmV4R1WrW: Skipped (non-dynamic)
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvHipImplicitGemmV4R4WrW: Skipped (non-dynamic)
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvAsmImplicitGemmV4R1DynamicWrw: Not applicable
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvHipImplicitGemmMlirCppWrW: Skipped (non-dynamic)
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvMlirIgemmWrWXdlops: Skipped (non-dynamic)
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvMlirIgemmWrW: Skipped (non-dynamic)
MIOpen(OpenCL): Info [FindSolutionImpl] ConvAsmImplicitGemmGTCDynamicWrwXdlops (not searchable)
MIOpen(OpenCL): Info2 [GetSolution] igemm_wrw_gtcx_nchw_fp16_bx8_ex1_bt32x32x8_wt16x16x4_ws1x1_wr1x1_ta1x1x1x1_1x8x1x32_tb1x1x1x1_1x8x1x32.s:igemm_wrw_gtcx_nchw_fp16_bx8_ex1_bt32x32x8_wt16x16x4_ws1x1_wr1x1_ta1x1x1x1_1x8x1x32_tb1x1x1x1_1x8x1x32
MIOpen(OpenCL): Info2 [SearchForAllSolutions] ConvAsmImplicitGemmGTCDynamicWrwXdlops: Success.
MIOpen(OpenCL): Info2 [AmdgcnAssembleQuiet] /opt/rocm/llvm/bin/clang  -x assembler -target amdgcn--amdhsa -mcpu=gfx900 /tmp/87dd-1eae-df26-db6c -o /dev/null 2>&1
MIOpen(OpenCL): Info2 [AmdgcnAssemble] ' -x assembler -target amdgcn--amdhsa -mno-xnack -mcpu=gfx908:sramecc+:xnack-  -Wa,-defsym,ROCM_METADATA_VERSION=5 - -o /tmp/miopen-tmp-e4c3-d9ff-ef01-9899/amdgcn-asm-out-XXXXXX'
MIOpen(OpenCL): Info2 [AmdgcnAssemble] ' -x assembler -target amdgcn--amdhsa -mno-xnack -mcpu=gfx908:sramecc+:xnack-  -Wa,-defsym,ROCM_METADATA_VERSION=5 - -o /tmp/miopen-tmp-8037-af12-5299-9456/amdgcn-asm-out-XXXXXX'
MIOpen(OpenCL): Info2 [Log] Kernel igemm_wrw_gtcx_nchw_fp16_bx8_ex1_bt32x32x8_wt16x16x4_ws1x1_wr1x1_ta1x1x1x1_1x8x1x32_tb1x1x1x1_1x8x1x32.s Compile Time, ms: 412.14
MIOpen(OpenCL): Info2 [Log] Kernel naive_conv_gcn.s Compile Time, ms: 490.228
MIOpen(OpenCL): Info2 [Log] PrecompileKernels Compile Time, ms: 491.044
MIOpen(OpenCL): Info2 [PrepareInvoker] Preparing kernel: naive_conv_wrw_nchw_fp16
MIOpen(OpenCL): Info [EvaluateInvokers] ConvDirectNaiveConvWrw: naive_conv_wrw_nchw_fp16: 0.097591 < 3.40282e+38
MIOpen(OpenCL): Info2 [Register] Invoker registered for algorithm 1x5x5x3x3x256x12x12x1xNCHWxFP16x0x0x2x2x1x1x1xW and solver ConvDirectNaiveConvWrw
MIOpen(OpenCL): Info2 [SetAsFound1_0] Solver ConvDirectNaiveConvWrw registered as find 1.0 best for miopenConvolutionBwdWeightsAlgoDirect in 1x5x5x3x3x256x12x12x1xNCHWxFP16x0x0x2x2x1x1x1xW
MIOpen(OpenCL): Info [EvaluateInvokers] Selected: ConvDirectNaiveConvWrw: naive_conv_wrw_nchw_fp16: 0.097591, workspce_sz = 0
MIOpen(OpenCL): Info2 [SetValues] 1-5-5-3x3-256-12-12-1-0x0-2x2-1x1-0-NCHW-FP16-W, content inserted: miopenConvolutionBwdWeightsAlgoDirect:ConvDirectNaiveConvWrw,0.097591,0,miopenConvolutionBwdWeightsAlgoDirect,<unused>
MIOpen(OpenCL): Info2 [PrepareInvoker] Preparing kernel: igemm_wrw_gtcx_nchw_fp16_bx8_ex1_bt32x32x8_wt16x16x4_ws1x1_wr1x1_ta1x1x1x1_1x8x1x32_tb1x1x1x1_1x8x1x32
MIOpen(OpenCL): Error [EvaluateInvokers] /dockerx/github/miopenx01/src/ocl_kernel.cpp:89: Running kernel failed:  Invalid Kernel Args
MIOpen(OpenCL): Info2 [StoreRecordUnsafe] Storing record: 1-5-5-3x3-256-12-12-1-0x0-2x2-1x1-0-NCHW-FP16-W
MIOpen(OpenCL): Info2 [FindRecordUnsafe] Looking for key 1-5-5-3x3-256-12-12-1-0x0-2x2-1x1-0-NCHW-FP16-W in file /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.2_13_0.ufdb.txt
MIOpen(OpenCL): Info2 [FindRecordUnsafe] File is unreadable: /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.2_13_0.ufdb.txt
MIOpen(OpenCL): Info2 [Measure] Db::StoreRecord time: 0.272677 ms
MIOpen(OpenCL): Info [FindConvBwdWeightsAlgorithm] miopenConvolutionBwdWeightsAlgoDirect        0.097591        0
MIOpen(OpenCL): Info [FindConvBwdWeightsAlgorithm] BWrW Chosen Algorithm: ConvDirectNaiveConvWrw , 0, 0.097591
MIOpen(OpenCL): Info [ConvolutionBackwardWeights] algo = 1, workspace = 0
MIOpen(OpenCL): Info2 [GetInvoker] Returning an invoker for problem 1x5x5x3x3x256x12x12x1xNCHWxFP16x0x0x2x2x1x1x1xW and algorithm miopenConvolutionBwdWeightsAlgoDirect
Wall-clock Time Backward Weights Conv. Elapsed: 0.292982 ms, Auxiliary API calls: 639.813 ms (GWSS: 141.848)
MIOpen Backward Weights Conv. Algorithm: 1, Solution: 87/ConvDirectNaiveConvWrw
GPU Kernel Time Backward Weights Conv. Elapsed: 0.093592 ms (average)
stats: name, n, c, ho, wo, x, y, k, flopCnt, bytesRead, bytesWritten, GFLOPs, GB/s, timeMs
stats: bwdw-conv3x3u2, 1, 256, 5, 5, 3, 3, 1,  115200, 0, 0, 1, 0, 0.093592
Backward Convolution Weights Verifies OK on CPU reference (0.000155719)
@atamazov
Copy link
Contributor Author

atamazov commented Jun 19, 2021

CI was unable to catch this problem because test_conv_igemm_dynamic_xdlops_wrw does NOT actually test the HALF type.

test/CMakeLists.txt currently contains the following command lines in test_conv_igemm_dynamic_xdlops_wrw:

COMMAND ${DYNAMIC_IMPLICITGEMM_WRW_ENVS_XDLOPS} $<TARGET_FILE:test_conv2d> --verbose --input  64  64 28 28...

This tests only FP32 configs because ${MIOPEN_TEST_FLOAT_ARG} (which may be one of --half, --bfloat16 etc) is not used.

There are some other commands, like:

COMMAND ${DYNAMIC_IMPLICITGEMM_WRW_ENVS_XDLOPS} $<TARGET_FILE:test_conv2d> --verbose --half --input  1 3 32 32...

But the problem is that test_conv2d does nothing when --verbose is the first option the datatype option (e.g. --half) is NOT the first option.

All these test issues + more is addressed in #991.

@atamazov
Copy link
Contributor Author

/cc @shurale-nkn for awareness.

@atamazov
Copy link
Contributor Author

Ping

@atamazov
Copy link
Contributor Author

Ping II

@atamazov atamazov changed the title [Fp16][MI100] ConvAsmImplicitGemmGTCDynamicWrwXdlops: "Running kernel failed: Invalid Kernel Args" [Fp16][MI100][OCL] ConvAsmImplicitGemmGTCDynamicWrwXdlops: "Running kernel failed: Invalid Kernel Args" Dec 12, 2021
@atamazov atamazov changed the title [Fp16][MI100][OCL] ConvAsmImplicitGemmGTCDynamicWrwXdlops: "Running kernel failed: Invalid Kernel Args" [Fp16][MI100] ConvAsmImplicitGemmGTCDynamicWrwXdlops: "Running kernel failed: Invalid Kernel Args" Dec 12, 2021
@atamazov atamazov added this to the ROCm 5.1 milestone Dec 12, 2021
@atamazov
Copy link
Contributor Author

Let's resolve this in 5.1. We are losing performance.

@junliume junliume modified the milestones: ROCm 5.1, ROCm 5.4 Aug 2, 2022
@junliume junliume removed this from the ROCm 5.4 milestone Jan 11, 2023
@ppanchad-amd
Copy link

@atamazov Please check if this is resolved in ROCm 6.0.2? Thanks!

@atamazov

This comment was marked as off-topic.

@atamazov
Copy link
Contributor Author

@atamazov Please check if this is resolved in ROCm 6.0.2? Thanks!

@carlushuang @shaojiewang Can you please help to answer? I do not see any signs of working on this issue; also I do not have MI100 on hand to check.

/cc @junliume @JehandadKhan

@littlewu2508
Copy link

Please notice that your question is an off-topic in this ticket.

And sorry about it. I ran into the Invalid Kernel Args problem caused by -mno-xnack and posts it here, but now I realized this issue are related with a different flag. I'll open a new issue about -mno-xnack

@atamazov

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants