Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[iGemm][test_conv2d][gfx908][half] Verification failed #917

Closed
shurale-nkn opened this issue May 9, 2021 · 22 comments
Closed

[iGemm][test_conv2d][gfx908][half] Verification failed #917

shurale-nkn opened this issue May 9, 2021 · 22 comments

Comments

@shurale-nkn
Copy link
Contributor

shurale-nkn commented May 9, 2021

test_conv2d with fp16 convolution failed
GPU: gfx908
Branch: develop (3e470a5)
Rocm 3.7

CXX=/opt/rocm/llvm/bin/clang++ CXXFLAGS='-Werror' cmake -DMIOPEN_TEST_FLAGS=' --disable-verification-cache --verbose' -DCMAKE_BUILD_TYPE=release  -DBUILD_DEV=Off -DCMAKE_INSTALL_PREFIX='../install' -DMIOPEN_TEST_ALL=On -DMIOPEN_TEST_HALF=On -DMIOPEN_TEST_GFX908=On ../

make -j 31 install test_conv2d

MIOPEN_FIND_MODE=normal MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmBwdDataV4R1Xdlops MIOPEN_LOG_LEVEL=5 ./bin/test
_conv2d --half --input 8 16 14 14 --weights 32 16 1 1 -v --disable-backward-weights --disable-forward --pads_strides_dilations 1 1 1 1 1 1
MIOpen(HIP): Info [GetFindModeValueImpl] MIOPEN_FIND_MODE = HYBRID(3)
MIOpen(HIP): Info [get_device_name] Raw device name: gfx908
MIOpen(HIP): Info [Handle] stream: 0, device_id: 0
MIOpen(HIP): Info [HipCompilerVersionImpl] 3.7.20315
MIOpen(HIP): Info [AmdRocmMetadataVersionDetect] ROCm MD version AMDHSA_COv3, MIOpen version 2.12.0.8441-3e470a551
MIOpen(HIP): Info [BackwardDataGetWorkSpaceSize] 
MIOpen(HIP): Info [GetBackwardSolutions] 
MIOpen(HIP): Info [Measure] Db::Prefetch time: 85.167 ms
MIOpen(HIP): Info [ForwardGetWorkSpaceSize] 
MIOpen(HIP): Info [GetForwardSolutions] 
MIOpen(HIP): Info [GetEnvFindOnlySolverImpl] 65
MIOpen(HIP): Info [BackwardWeightsGetWorkSpaceSize] 
MIOpen(HIP): Info [GetWrwSolutions] 
./bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-weights --input 8, 16, 14, 14 --weights 32, 16, 1, 1 --pads_strides_dilations 1 1 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW 
MIOpen(HIP): Info [BackwardDataGetWorkSpaceSize] 
MIOpen(HIP): Info [GetBackwardSolutions] 
MIOpen(HIP): Info [FindConvBwdDataAlgorithm] requestAlgoCount = 1, workspace = 0
MIOpen(HIP): Info [GetBackwardSolutions] 
MIOpen(HIP): Info [CompileBackwardSolution] solver_id = ConvHipImplicitGemmBwdDataV4R1Xdlops
MIOpen(HIP): Info [FindSolutionImpl] ConvHipImplicitGemmBwdDataV4R1Xdlops
MIOpen(HIP): Info [FindSolutionImpl] Perf Db: record not found for: ConvHipImplicitGemmBwdDataV4R1Xdlops
MIOpen(HIP): Info [CalculateGemmBBlockCopyPerformanceParameters] catch
MIOpen(HIP): Info [HeuristicInit] 32,16,8,4,16,16,1,1
MIOpen(HIP): Info [GetPerformanceConfigBase] 32,16,8,4,16,16,1,1
MIOpen(HIP): Info [KernDb] database not present
MIOpen(HIP): Info [FindConvBwdDataAlgorithm] miopenConvolutionBwdDataAlgoImplicitGEMM   0.007836        0
MIOpen(HIP): Info [FindConvBwdDataAlgorithm] BWD Chosen Algorithm: ConvHipImplicitGemmBwdDataV4R1Xdlops , 0, 0.007836
MIOpen(HIP): Info [ConvolutionBackwardData] algo = 5, workspace = 0
MIOpen(HIP): Info [GetBackwardSolutions] 
FAILED: 0.105749
Max diff: 80
Mismatch at 14: 3 != 32
Backward convolution: ConvHipImplicitGemmBwdDataV4R1Xdlops
Input tensor: 8, 16, 14, 14
Weights tensor: 32, 16, 1, 1
Output tensor: 8, 32, 16, 16
Filter: conv2d, miopenConvolution, miopenPaddingDefault, {1, 1}, {1, 1}, {1, 1}, 
CONSOLE LOGS_LEVEL=6
MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvAsmImplicitGemmV4R1DynamicBwd MIOPEN_LOG_LEVEL=6 ./bin/test
_conv2d --half --input 8 16 14 14 --weights 32 16 1 1 -v --disable-backward-weights --disable-forward --pads_strides_dilations 1 1 1 1 1 1
MIOpen(HIP): Info [GetFindModeValueImpl] MIOPEN_FIND_MODE = HYBRID(3)
MIOpen(HIP): Info [get_device_name] Raw device name: gfx908
MIOpen(HIP): Info [Handle] stream: 0, device_id: 0
MIOpen(HIP): Info2 [HipCompilerVersionImpl] Read version information from HIP package...
MIOpen(HIP): Info [HipCompilerVersionImpl] 3.7.20315
MIOpen(HIP): Info [AmdRocmMetadataVersionDetect] ROCm MD version AMDHSA_COv3, MIOpen version 2.12.0.8441-3e470a551
MIOpen(HIP): Info2 [ValidateGcnAssemblerImpl] Running: '/opt/rocm/llvm/bin/clang --version'
MIOpen(HIP): Info2 [ValidateGcnAssemblerImpl] clang version 11.0.0 (/src/external/llvm-project/clang ee4e4ebbadcc8ea14ce99e34ed31ab31e94827ac)
MIOpen(HIP): Info2 [ValidateGcnAssemblerImpl] Target: x86_64-unknown-linux-gnu
MIOpen(HIP): Info2 [ValidateGcnAssemblerImpl] Thread model: posix
MIOpen(HIP): Info2 [ValidateGcnAssemblerImpl] InstalledDir: /opt/rocm/llvm/bin
MIOpen(HIP): Info2 [ValidateGcnAssemblerImpl] 
MIOpen(HIP): Info [BackwardDataGetWorkSpaceSize] 
MIOpen(HIP): Info [GetBackwardSolutions] 
MIOpen(HIP): Info [Measure] Db::Prefetch time: 86.4456 ms
MIOpen(HIP): Info2 [FindRecordUnsafe] Looking for key 32-16-16-1x1-16-14-14-8-1x1-1x1-1x1-0-NCHW-FP16-B in file /root/.config/miopen//gfx90878.HIP.2_12_0_8441-3e470a551.ufdb.txt
MIOpen(HIP): Info2 [FindRecordUnsafe] Key match: 32-16-16-1x1-16-14-14-8-1x1-1x1-1x1-0-NCHW-FP16-B
MIOpen(HIP): Info2 [FindRecordUnsafe] Contents found: miopenConvolutionBwdDataAlgoImplicitGEMM:ConvHipImplicitGemmBwdDataV4R1Xdlops,0.007836,0,miopenConvolutionBwdDataAlgoImplicitGEMM,<unused>;miopenConvolutionBwdDataAlgoGEMM:GemmBwdRest,40.2383,8192,miopenConvolutionBwdDataAlgoGEMM,<unused>;miopenConvolutionBwdDataAlgoDirect:ConvDirectNaiveConvBwd,0.05949,0,miopenConvolutionBwdDataAlgoDirect,<unused>
MIOpen(HIP): Info2 [Measure] Db::FindRecord time: 0.11762 ms
MIOpen(HIP): Info2 [BackwardDataGetWorkSpaceSize] 0
MIOpen(HIP): Info [ForwardGetWorkSpaceSize] 
MIOpen(HIP): Info [GetForwardSolutions] 
MIOpen(HIP): Info2 [FindRecordUnsafe] Looking for key 16-14-14-1x1-32-16-16-8-1x1-1x1-1x1-0-NCHW-FP16-F in file /root/.config/miopen//gfx90878.HIP.2_12_0_8441-3e470a551.ufdb.txt
MIOpen(HIP): Info2 [FindRecord] Looking for key 16-14-14-1x1-32-16-16-8-1x1-1x1-1x1-0-NCHW-FP16-F in file /home/kamil/docker_run/install/miopen/share/miopen/db/gfx90878.HIP.fdb.txt
MIOpen(HIP): Info2 [Measure] Db::FindRecord time: 0.056374 ms
MIOpen(HIP): Info2 [GetSolutionsFallback] ConvDirectNaiveConvFwd Estimated WTI = 0.01
MIOpen(HIP): Info2 [GetSolutionsFallback] ConvBinWinogradRxSf2x3g1 Estimated WTI = -2
MIOpen(HIP): Info2 [GetSolutionsFallback] GemmFwdRest Estimated WTI = 0.2592
MIOpen(HIP): Info2 [GetSolutionsFallback] ConvAsmImplicitGemmGTCDynamicFwdXdlops Estimated WTI = -2
MIOpen(HIP): Info2 [GetSolutionsFallback] maxSolutionCount = 1, available = 2
MIOpen(HIP): Info2 [GetSolutionsFallback] id: 85 algo: 1, time: 1000 ms, ws: 0, name: ConvDirectNaiveConvFwd
MIOpen(HIP): Info2 [GetSolutionsFallback] id: 91 algo: 0, time: 38.5802 ms, ws: 8192, name: GemmFwdRest
MIOpen(HIP): Info [GetEnvFindOnlySolverImpl] 65
MIOpen(HIP): Info2 [GetWorkspaceSize] ConvAsmImplicitGemmV4R1DynamicBwd: Not applicable
MIOpen(HIP): Info2 [ForwardGetWorkSpaceSize] 0
MIOpen(HIP): Info [BackwardWeightsGetWorkSpaceSize] 
MIOpen(HIP): Info [GetWrwSolutions] 
MIOpen(HIP): Info2 [FindRecordUnsafe] Looking for key 32-16-16-1x1-16-14-14-8-1x1-1x1-1x1-0-NCHW-FP16-W in file /root/.config/miopen//gfx90878.HIP.2_12_0_8441-3e470a551.ufdb.txt
MIOpen(HIP): Info2 [FindRecord] Looking for key 32-16-16-1x1-16-14-14-8-1x1-1x1-1x1-0-NCHW-FP16-W in file /home/kamil/docker_run/install/miopen/share/miopen/db/gfx90878.HIP.fdb.txt
MIOpen(HIP): Info2 [Measure] Db::FindRecord time: 0.053644 ms
MIOpen(HIP): Info2 [GetSolutionsFallback] GemmWrwUniversal Estimated WTI = 0.18144
MIOpen(HIP): Info2 [GetSolutionsFallback] ConvBinWinogradRxSf2x3g1 Estimated WTI = -2
MIOpen(HIP): Info2 [GetSolutionsFallback] ConvDirectNaiveConvWrw Estimated WTI = 0.01
MIOpen(HIP): Info2 [GetSolutionsFallback] ConvAsmImplicitGemmGTCDynamicWrwXdlops Estimated WTI = -2
MIOpen(HIP): Info2 [GetSolutionsFallback] maxSolutionCount = 1, available = 2
MIOpen(HIP): Info2 [GetSolutionsFallback] id: 102 algo: 0, time: 55.1146 ms, ws: 8192, name: GemmWrwUniversal
MIOpen(HIP): Info2 [GetSolutionsFallback] id: 87 algo: 1, time: 1000 ms, ws: 0, name: ConvDirectNaiveConvWrw
MIOpen(HIP): Info2 [BackwardWeightsGetWorkSpaceSize] 0
./bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-weights --input 8, 16, 14, 14 --weights 32, 16, 1, 1 --pads_strides_dilations 1 1 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW 
MIOpen(HIP): Info [BackwardDataGetWorkSpaceSize] 
MIOpen(HIP): Info [GetBackwardSolutions] 
MIOpen(HIP): Info2 [FindRecordUnsafe] Looking for key 32-16-16-1x1-16-14-14-8-1x1-1x1-1x1-0-NCHW-FP16-B in file /root/.config/miopen//gfx90878.HIP.2_12_0_8441-3e470a551.ufdb.txt
MIOpen(HIP): Info2 [FindRecordUnsafe] Key match: 32-16-16-1x1-16-14-14-8-1x1-1x1-1x1-0-NCHW-FP16-B
MIOpen(HIP): Info2 [FindRecordUnsafe] Contents found: miopenConvolutionBwdDataAlgoImplicitGEMM:ConvHipImplicitGemmBwdDataV4R1Xdlops,0.007836,0,miopenConvolutionBwdDataAlgoImplicitGEMM,<unused>;miopenConvolutionBwdDataAlgoGEMM:GemmBwdRest,40.2383,8192,miopenConvolutionBwdDataAlgoGEMM,<unused>;miopenConvolutionBwdDataAlgoDirect:ConvDirectNaiveConvBwd,0.05949,0,miopenConvolutionBwdDataAlgoDirect,<unused>
MIOpen(HIP): Info2 [Measure] Db::FindRecord time: 0.178727 ms
MIOpen(HIP): Info2 [BackwardDataGetWorkSpaceSize] 0
MIOpen(HIP): Info [FindConvBwdDataAlgorithm] requestAlgoCount = 1, workspace = 0
MIOpen(HIP): Info [GetBackwardSolutions] 
MIOpen(HIP): Info2 [FindRecordUnsafe] Looking for key 32-16-16-1x1-16-14-14-8-1x1-1x1-1x1-0-NCHW-FP16-B in file /root/.config/miopen//gfx90878.HIP.2_12_0_8441-3e470a551.ufdb.txt
MIOpen(HIP): Info2 [FindRecordUnsafe] Key match: 32-16-16-1x1-16-14-14-8-1x1-1x1-1x1-0-NCHW-FP16-B
MIOpen(HIP): Info2 [FindRecordUnsafe] Contents found: miopenConvolutionBwdDataAlgoImplicitGEMM:ConvHipImplicitGemmBwdDataV4R1Xdlops,0.007836,0,miopenConvolutionBwdDataAlgoImplicitGEMM,<unused>;miopenConvolutionBwdDataAlgoGEMM:GemmBwdRest,40.2383,8192,miopenConvolutionBwdDataAlgoGEMM,<unused>;miopenConvolutionBwdDataAlgoDirect:ConvDirectNaiveConvBwd,0.05949,0,miopenConvolutionBwdDataAlgoDirect,<unused>
MIOpen(HIP): Info2 [Measure] Db::FindRecord time: 0.127667 ms
MIOpen(HIP): Info [CompileBackwardSolution] solver_id = ConvHipImplicitGemmBwdDataV4R1Xdlops
MIOpen(HIP): Info2 [GetInvoker] Returning an invoker for problem 32x16x16x1x1x16x14x14x8xNCHWxFP16x1x1x1x1x1x1x1xB and solver ConvHipImplicitGemmBwdDataV4R1Xdlops
MIOpen(HIP): Info2 [SQLiteBase] Initializing system database file /home/kamil/docker_run/install/miopen/share/miopen/db/miopen.db
MIOpen(HIP): Info2 [SQLiteBase] Initializing user database file /root/.config/miopen/miopen_1.0.0.udb
MIOpen(HIP): Info [FindSolutionImpl] ConvHipImplicitGemmBwdDataV4R1Xdlops
MIOpen(HIP): Info2 [Prepare] SELECT solver, params FROM perf_db INNER JOIN config ON perf_db.config = config.id WHERE ( (layout = ? ) AND (data_type = ? ) AND (direction = ? ) AND (spatial_dim = ? ) AND (in_channels = ? ) AND (in_h = ? ) AND (in_w = ? ) AND (in_d = ? ) AND (fil_h = ? ) AND (fil_w = ? ) AND (fil_d = ? ) AND (out_channels = ? ) AND (batchsize = ? ) AND (pad_h = ? ) AND (pad_w = ? ) AND (pad_d = ? ) AND (conv_stride_h = ? ) AND (conv_stride_w = ? ) AND (conv_stride_d = ? ) AND (dilation_h = ? ) AND (dilation_w = ? ) AND (dilation_d = ? ) AND (bias = ? ) AND (group_count = ? ) )AND (arch = 'gfx908' ) AND (num_cu = '120');
MIOpen(HIP): Info2 [impl] [NCHW,FP16,B,2,32,16,16,1,1,1,1,16,8,1,1,0,1,1,0,1,1,0,0,1]
MIOpen(HIP): Info2 [Prepare] SELECT solver, params FROM perf_db INNER JOIN config ON perf_db.config = config.id WHERE ( (layout = ? ) AND (data_type = ? ) AND (direction = ? ) AND (spatial_dim = ? ) AND (in_channels = ? ) AND (in_h = ? ) AND (in_w = ? ) AND (in_d = ? ) AND (fil_h = ? ) AND (fil_w = ? ) AND (fil_d = ? ) AND (out_channels = ? ) AND (batchsize = ? ) AND (pad_h = ? ) AND (pad_w = ? ) AND (pad_d = ? ) AND (conv_stride_h = ? ) AND (conv_stride_w = ? ) AND (conv_stride_d = ? ) AND (dilation_h = ? ) AND (dilation_w = ? ) AND (dilation_d = ? ) AND (bias = ? ) AND (group_count = ? ) )AND (arch = 'gfx908' ) AND (num_cu = '120');
MIOpen(HIP): Info2 [impl] [NCHW,FP16,B,2,32,16,16,1,1,1,1,16,8,1,1,0,1,1,0,1,1,0,0,1]
MIOpen(HIP): Info2 [Measure] Db::Load time: 1.55442 ms
MIOpen(HIP): Info [FindSolutionImpl] Perf Db: record not found for: ConvHipImplicitGemmBwdDataV4R1Xdlops
MIOpen(HIP): Info [CalculateGemmBBlockCopyPerformanceParameters] catch
MIOpen(HIP): Info [HeuristicInit] 32,16,8,4,16,16,1,1
MIOpen(HIP): Info [GetPerformanceConfigBase] 32,16,8,4,16,16,1,1
MIOpen(HIP): Info2 [PrepareInvoker] Preparing kernel: gridwise_convolution_backward_data_implicit_gemm_v4r1_xdlops_nchw_kcyx_nkhw
MIOpen(HIP): Info2 [SQLiteBase] Initializing system database file 
MIOpen(HIP): Info [KernDb] database not present
MIOpen(HIP): Info2 [SQLiteBase] Initializing user database file /root/.cache/miopen/2.12.0.8441-3e470a551/gfx90878.ukdb
MIOpen(HIP): Info2 [KernDb] Database created successfully
MIOpen(HIP): Info2 [LoadBinary] Loading binary for: gridwise_convolution_backward_data_implicit_gemm_v4r1_xdlops_nchw_kcyx_nkhw.cpp.o; args:  -DCK_PARAM_PROBLEM_N=8 -DCK_PARAM_PROBLEM_K=32 -DCK_PARAM_PROBLEM_C=16 -DCK_PARAM_PROBLEM_HI=14 -DCK_PARAM_PROBLEM_WI=14 -DCK_PARAM_PROBLEM_HO=16 -DCK_PARAM_PROBLEM_WO=16 -DCK_PARAM_PROBLEM_Y=1 -DCK_PARAM_PROBLEM_X=1 -DCK_PARAM_PROBLEM_CONV_STRIDE_H=1 -DCK_PARAM_PROBLEM_CONV_STRIDE_W=1 -DCK_PARAM_PROBLEM_CONV_DILATION_H=1 -DCK_PARAM_PROBLEM_CONV_DILATION_W=1 -DCK_PARAM_PROBLEM_IN_LEFT_PAD_H=1 -DCK_PARAM_PROBLEM_IN_LEFT_PAD_W=1 -DCK_PARAM_PROBLEM_IN_RIGHT_PAD_H=1 -DCK_PARAM_PROBLEM_IN_RIGHT_PAD_W=1 -DCK_PARAM_PROBLEM_CONV_GROUP_COUNTS=1 -DCK_PARAM_TUNABLE_BLOCK_SIZE=128 -DCK_PARAM_TUNABLE_GEMM_M_PER_BLOCK=16 -DCK_PARAM_TUNABLE_GEMM_N_PER_BLOCK=32 -DCK_PARAM_TUNABLE_GEMM_K_PER_BLOCK=8 -DCK_PARAM_GEMM_M_PER_WAVE=16 -DCK_PARAM_GEMM_N_PER_WAVE=16 -DCK_PARAM_TUNABLE_GEMM_A_BLOCK_COPY_CLUSTER_LENGTHS_GEMM_K=8 -DCK_PARAM_TUNABLE_GEMM_A_BLOCK_COPY_CLUSTER_LENGTHS_GEMM_M=2 -DCK_PARAM_DEPENDENT_GEMM_A_BLOCK_COPY_CLUSTER_LENGTHS_GEMM_KPACK=4 -DCK_PARAM_TUNABLE_GEMM_A_BLOCK_COPY_SRC_DATA_PER_READ_GEMM_M=8 -DCK_PARAM_TUNABLE_GEMM_B_BLOCK_COPY_CLUSTER_LENGTHS_GEMM_K=8 -DCK_PARAM_TUNABLE_GEMM_B_BLOCK_COPY_CLUSTER_LENGTHS_GEMM_N=4 -DCK_PARAM_DEPENDENT_GEMM_B_BLOCK_COPY_CLUSTER_LENGTHS_GEMM_KPACK=4 -DCK_PARAM_TUNABLE_GEMM_B_BLOCK_COPY_SRC_DATA_PER_READ_GEMM_N=8 -DCK_PARAM_DEPENDENT_GRID_SIZE=49 -DCK_USE_AMD_XDLOPS=1 -DCK_USE_AMD_XDLOPS_INLINE_ASM=0 -DCK_USE_AMD_XDLOPS_EMULATE=0 -DCK_PARAM_GEMM_ID=0 --std=c++14 -DCK_USE_AMD_BUFFER_ATOMIC_FADD=1 -DCK_BLOCK_SYNC_LDS_WITHOUT_SYNC_VMEM=1 -DCK_WORKAROUND_SWDEV_229564=1 -DCK_WORKAROUND_SWDEV_231101=1 -DCK_USE_AMD_BUFFER_ADDRESSING=1 -DCK_USE_AMD_V_FMAC_F32=0 -DMIOPEN_USE_FP16=1 -DMIOPEN_USE_FP32=0 -DMIOPEN_USE_INT8=0 -DMIOPEN_USE_INT8x4=0 -DMIOPEN_USE_BFP16=0 -DMIOPEN_USE_INT32=0 -DMIOPEN_USE_RNE_BFLOAT16=1 -DCK_PARAM_KPACK_LENGTH=4 -DCK_PARAM_TUNABLE_GEMM_A_BLOCK_COPY_DST_DATA_PER_WRITE_GEMM_KPACK=1 -DCK_PARAM_TUNABLE_GEMM_B_BLOCK_COPY_DST_DATA_PER_WRITE_GEMM_KPACK=1 -mcpu=gfx908
MIOpen(HIP): Info2 [Prepare] SELECT kernel_blob, kernel_hash, uncompressed_size FROM kern_db WHERE (kernel_name = 'gridwise_convolution_backward_data_implicit_gemm_v4r1_xdlops_nchw_kcyx_nkhw.cpp.o') AND (kernel_args = ' -DCK_PARAM_PROBLEM_N=8 -DCK_PARAM_PROBLEM_K=32 -DCK_PARAM_PROBLEM_C=16 -DCK_PARAM_PROBLEM_HI=14 -DCK_PARAM_PROBLEM_WI=14 -DCK_PARAM_PROBLEM_HO=16 -DCK_PARAM_PROBLEM_WO=16 -DCK_PARAM_PROBLEM_Y=1 -DCK_PARAM_PROBLEM_X=1 -DCK_PARAM_PROBLEM_CONV_STRIDE_H=1 -DCK_PARAM_PROBLEM_CONV_STRIDE_W=1 -DCK_PARAM_PROBLEM_CONV_DILATION_H=1 -DCK_PARAM_PROBLEM_CONV_DILATION_W=1 -DCK_PARAM_PROBLEM_IN_LEFT_PAD_H=1 -DCK_PARAM_PROBLEM_IN_LEFT_PAD_W=1 -DCK_PARAM_PROBLEM_IN_RIGHT_PAD_H=1 -DCK_PARAM_PROBLEM_IN_RIGHT_PAD_W=1 -DCK_PARAM_PROBLEM_CONV_GROUP_COUNTS=1 -DCK_PARAM_TUNABLE_BLOCK_SIZE=128 -DCK_PARAM_TUNABLE_GEMM_M_PER_BLOCK=16 -DCK_PARAM_TUNABLE_GEMM_N_PER_BLOCK=32 -DCK_PARAM_TUNABLE_GEMM_K_PER_BLOCK=8 -DCK_PARAM_GEMM_M_PER_WAVE=16 -DCK_PARAM_GEMM_N_PER_WAVE=16 -DCK_PARAM_TUNABLE_GEMM_A_BLOCK_COPY_CLUSTER_LENGTHS_GEMM_K=8 -DCK_PARAM_TUNABLE_GEMM_A_BLOCK_COPY_CLUSTER_LENGTHS_GEMM_M=2 -DCK_PARAM_DEPENDENT_GEMM_A_BLOCK_COPY_CLUSTER_LENGTHS_GEMM_KPACK=4 -DCK_PARAM_TUNABLE_GEMM_A_BLOCK_COPY_SRC_DATA_PER_READ_GEMM_M=8 -DCK_PARAM_TUNABLE_GEMM_B_BLOCK_COPY_CLUSTER_LENGTHS_GEMM_K=8 -DCK_PARAM_TUNABLE_GEMM_B_BLOCK_COPY_CLUSTER_LENGTHS_GEMM_N=4 -DCK_PARAM_DEPENDENT_GEMM_B_BLOCK_COPY_CLUSTER_LENGTHS_GEMM_KPACK=4 -DCK_PARAM_TUNABLE_GEMM_B_BLOCK_COPY_SRC_DATA_PER_READ_GEMM_N=8 -DCK_PARAM_DEPENDENT_GRID_SIZE=49 -DCK_USE_AMD_XDLOPS=1 -DCK_USE_AMD_XDLOPS_INLINE_ASM=0 -DCK_USE_AMD_XDLOPS_EMULATE=0 -DCK_PARAM_GEMM_ID=0 --std=c++14 -DCK_USE_AMD_BUFFER_ATOMIC_FADD=1 -DCK_BLOCK_SYNC_LDS_WITHOUT_SYNC_VMEM=1 -DCK_WORKAROUND_SWDEV_229564=1 -DCK_WORKAROUND_SWDEV_231101=1 -DCK_USE_AMD_BUFFER_ADDRESSING=1 -DCK_USE_AMD_V_FMAC_F32=0 -DMIOPEN_USE_FP16=1 -DMIOPEN_USE_FP32=0 -DMIOPEN_USE_INT8=0 -DMIOPEN_USE_INT8x4=0 -DMIOPEN_USE_BFP16=0 -DMIOPEN_USE_INT32=0 -DMIOPEN_USE_RNE_BFLOAT16=1 -DCK_PARAM_KPACK_LENGTH=4 -DCK_PARAM_TUNABLE_GEMM_A_BLOCK_COPY_DST_DATA_PER_WRITE_GEMM_KPACK=1 -DCK_PARAM_TUNABLE_GEMM_B_BLOCK_COPY_DST_DATA_PER_WRITE_GEMM_KPACK=1 -mcpu=gfx908');
MIOpen(HIP): Info2 [Measure] Db::FindRecord time: 2.08897 ms
MIOpen(HIP): Info2 [LoadBinary] Sucessfully loaded binary for: gridwise_convolution_backward_data_implicit_gemm_v4r1_xdlops_nchw_kcyx_nkhw.cpp.o; args:  -DCK_PARAM_PROBLEM_N=8 -DCK_PARAM_PROBLEM_K=32 -DCK_PARAM_PROBLEM_C=16 -DCK_PARAM_PROBLEM_HI=14 -DCK_PARAM_PROBLEM_WI=14 -DCK_PARAM_PROBLEM_HO=16 -DCK_PARAM_PROBLEM_WO=16 -DCK_PARAM_PROBLEM_Y=1 -DCK_PARAM_PROBLEM_X=1 -DCK_PARAM_PROBLEM_CONV_STRIDE_H=1 -DCK_PARAM_PROBLEM_CONV_STRIDE_W=1 -DCK_PARAM_PROBLEM_CONV_DILATION_H=1 -DCK_PARAM_PROBLEM_CONV_DILATION_W=1 -DCK_PARAM_PROBLEM_IN_LEFT_PAD_H=1 -DCK_PARAM_PROBLEM_IN_LEFT_PAD_W=1 -DCK_PARAM_PROBLEM_IN_RIGHT_PAD_H=1 -DCK_PARAM_PROBLEM_IN_RIGHT_PAD_W=1 -DCK_PARAM_PROBLEM_CONV_GROUP_COUNTS=1 -DCK_PARAM_TUNABLE_BLOCK_SIZE=128 -DCK_PARAM_TUNABLE_GEMM_M_PER_BLOCK=16 -DCK_PARAM_TUNABLE_GEMM_N_PER_BLOCK=32 -DCK_PARAM_TUNABLE_GEMM_K_PER_BLOCK=8 -DCK_PARAM_GEMM_M_PER_WAVE=16 -DCK_PARAM_GEMM_N_PER_WAVE=16 -DCK_PARAM_TUNABLE_GEMM_A_BLOCK_COPY_CLUSTER_LENGTHS_GEMM_K=8 -DCK_PARAM_TUNABLE_GEMM_A_BLOCK_COPY_CLUSTER_LENGTHS_GEMM_M=2 -DCK_PARAM_DEPENDENT_GEMM_A_BLOCK_COPY_CLUSTER_LENGTHS_GEMM_KPACK=4 -DCK_PARAM_TUNABLE_GEMM_A_BLOCK_COPY_SRC_DATA_PER_READ_GEMM_M=8 -DCK_PARAM_TUNABLE_GEMM_B_BLOCK_COPY_CLUSTER_LENGTHS_GEMM_K=8 -DCK_PARAM_TUNABLE_GEMM_B_BLOCK_COPY_CLUSTER_LENGTHS_GEMM_N=4 -DCK_PARAM_DEPENDENT_GEMM_B_BLOCK_COPY_CLUSTER_LENGTHS_GEMM_KPACK=4 -DCK_PARAM_TUNABLE_GEMM_B_BLOCK_COPY_SRC_DATA_PER_READ_GEMM_N=8 -DCK_PARAM_DEPENDENT_GRID_SIZE=49 -DCK_USE_AMD_XDLOPS=1 -DCK_USE_AMD_XDLOPS_INLINE_ASM=0 -DCK_USE_AMD_XDLOPS_EMULATE=0 -DCK_PARAM_GEMM_ID=0 --std=c++14 -DCK_USE_AMD_BUFFER_ATOMIC_FADD=1 -DCK_BLOCK_SYNC_LDS_WITHOUT_SYNC_VMEM=1 -DCK_WORKAROUND_SWDEV_229564=1 -DCK_WORKAROUND_SWDEV_231101=1 -DCK_USE_AMD_BUFFER_ADDRESSING=1 -DCK_USE_AMD_V_FMAC_F32=0 -DMIOPEN_USE_FP16=1 -DMIOPEN_USE_FP32=0 -DMIOPEN_USE_INT8=0 -DMIOPEN_USE_INT8x4=0 -DMIOPEN_USE_BFP16=0 -DMIOPEN_USE_INT32=0 -DMIOPEN_USE_RNE_BFLOAT16=1 -DCK_PARAM_KPACK_LENGTH=4 -DCK_PARAM_TUNABLE_GEMM_A_BLOCK_COPY_DST_DATA_PER_WRITE_GEMM_KPACK=1 -DCK_PARAM_TUNABLE_GEMM_B_BLOCK_COPY_DST_DATA_PER_WRITE_GEMM_KPACK=1 -mcpu=gfx908
MIOpen(HIP): Info2 [Register] Invoker registered for algorithm 32x16x16x1x1x16x14x14x8xNCHWxFP16x1x1x1x1x1x1x1xB and solver ConvHipImplicitGemmBwdDataV4R1Xdlops
MIOpen(HIP): Info2 [SetAsFound1_0] Solver ConvHipImplicitGemmBwdDataV4R1Xdlops registered as find 1.0 best for miopenConvolutionBwdDataAlgoImplicitGEMM in 32x16x16x1x1x16x14x14x8xNCHWxFP16x1x1x1x1x1x1x1xB
MIOpen(HIP): Info [FindConvBwdDataAlgorithm] miopenConvolutionBwdDataAlgoImplicitGEMM   0.007836        0
MIOpen(HIP): Info [FindConvBwdDataAlgorithm] BWD Chosen Algorithm: ConvHipImplicitGemmBwdDataV4R1Xdlops , 0, 0.007836
MIOpen(HIP): Info [ConvolutionBackwardData] algo = 5, workspace = 0
MIOpen(HIP): Info2 [GetInvoker] Returning an invoker for problem 32x16x16x1x1x16x14x14x8xNCHWxFP16x1x1x1x1x1x1x1xB and algorithm miopenConvolutionBwdDataAlgoImplicitGEMM
MIOpen(HIP): Info [GetBackwardSolutions] 
MIOpen(HIP): Info2 [FindRecordUnsafe] Looking for key 32-16-16-1x1-16-14-14-8-1x1-1x1-1x1-0-NCHW-FP16-B in file /root/.config/miopen//gfx90878.HIP.2_12_0_8441-3e470a551.ufdb.txt
MIOpen(HIP): Info2 [FindRecordUnsafe] Key match: 32-16-16-1x1-16-14-14-8-1x1-1x1-1x1-0-NCHW-FP16-B
MIOpen(HIP): Info2 [FindRecordUnsafe] Contents found: miopenConvolutionBwdDataAlgoImplicitGEMM:ConvHipImplicitGemmBwdDataV4R1Xdlops,0.007836,0,miopenConvolutionBwdDataAlgoImplicitGEMM,<unused>;miopenConvolutionBwdDataAlgoGEMM:GemmBwdRest,40.2383,8192,miopenConvolutionBwdDataAlgoGEMM,<unused>;miopenConvolutionBwdDataAlgoDirect:ConvDirectNaiveConvBwd,0.05949,0,miopenConvolutionBwdDataAlgoDirect,<unused>
MIOpen(HIP): Info2 [Measure] Db::FindRecord time: 0.057663 ms
FAILED: 0.105749
Max diff: 80
Mismatch at 14: 3 != 32
Backward convolution: ConvHipImplicitGemmBwdDataV4R1Xdlops
Input tensor: 8, 16, 14, 14
Weights tensor: 32, 16, 1, 1
Output tensor: 8, 32, 16, 16
Filter: conv2d, miopenConvolution, miopenPaddingDefault, {1, 1}, {1, 1}, {1, 1}, 
@atamazov
Copy link
Contributor

atamazov commented May 9, 2021

The solver is originated from #272, @carlushuang

@atamazov

This comment has been minimized.

@atamazov atamazov changed the title [ImplicitGemm][test_conv2d][gfx908][half] Verification failed [iGemm][test_conv2d][gfx908][half] Verification failed May 9, 2021
@atamazov
Copy link
Contributor

atamazov commented May 9, 2021

@atamazov
Copy link
Contributor

atamazov commented May 9, 2021

The most important question: is this reproducible with the most recent ROCm 4.2 RC?

@carlushuang

This comment has been minimized.

@shurale-nkn

This comment has been minimized.

@shurale-nkn

This comment has been minimized.

@atamazov
Copy link
Contributor

The DEV build is the one that has cmake ... -DBUILD_DEV=On.

@shurale-nkn
Copy link
Contributor Author

The DEV build is the one that has cmake ... -DBUILD_DEV=On.

Ohh, that DEV build.

The result is the same:

$CXX=/opt/rocm/llvm/bin/clang++ CXXFLAGS='-Werror' cmake -DMIOPEN_TEST_FLAGS=' --disable-verification-cache --verbose' -DCMAKE_BUILD_TYPE=release  -DBUILD_DEV=On -DMIOPEN_TEST_ALL=On -DMIOPEN_TEST_HALF=On -DMIOPEN_TEST_GFX908=On ../MIOpen
$make -j 31 test_conv2d
$MIOPEN_FIND_MODE=normal MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmBwdDataV4R1Xdlops MIOPEN_LOG_LEVEL=5 ./bin/test_conv2d --half --input 8 16 14 14 --weights 32 16 1 1 -v --disable-backward-weights --disable-forward --pads_strides_dilations 1 1 1 1 1 1

MIOpen(HIP): Info [GetFindModeValueImpl] MIOPEN_FIND_MODE = NORMAL(1)
MIOpen(HIP): Info [get_device_name] Raw device name: gfx908
MIOpen(HIP): Info [Handle] stream: 0x20b0480, device_id: 0
MIOpen(HIP): Info [HipCompilerVersionImpl] 3.7.20315
MIOpen(HIP): Info [AmdRocmMetadataVersionDetect] ROCm MD version AMDHSA_COv3, MIOpen version 2.12.0.8441-3e470a551
MIOpen(HIP): Info [BackwardDataGetWorkSpaceSize] 
MIOpen(HIP): Info [GetEnvFindOnlySolverImpl] 60
MIOpen(HIP): Info [ForwardGetWorkSpaceSize] 
MIOpen(HIP): Info [BackwardWeightsGetWorkSpaceSize] 
./bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-weights --input 8, 16, 14, 14 --weights 32, 16, 1, 1 --pads_strides_dilations 1 1 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW 
MIOpen(HIP): Info [BackwardDataGetWorkSpaceSize] 
MIOpen(HIP): Info [FindConvBwdDataAlgorithm] requestAlgoCount = 1, workspace = 0
MIOpen(HIP): Info [TryLoad] Find-db regenerating.
MIOpen(HIP): Info [FindSolutionImpl] ConvHipImplicitGemmBwdDataV4R1Xdlops
MIOpen(HIP): Info [FindSolutionImpl] Perf Db: record not found for: ConvHipImplicitGemmBwdDataV4R1Xdlops
MIOpen(HIP): Info [CalculateGemmBBlockCopyPerformanceParameters] catch
MIOpen(HIP): Info [HeuristicInit] 32,16,8,4,16,16,1,1
MIOpen(HIP): Info [GetPerformanceConfigBase] 32,16,8,4,16,16,1,1
MIOpen(HIP): Info [EvaluateInvokers] ConvHipImplicitGemmBwdDataV4R1Xdlops: gridwise_convolution_backward_data_implicit_gemm_v4r1_xdlops_nchw_kcyx_nkhw: 0.007197 < 3.40282e+38
MIOpen(HIP): Info [EvaluateInvokers] Selected: ConvHipImplicitGemmBwdDataV4R1Xdlops: gridwise_convolution_backward_data_implicit_gemm_v4r1_xdlops_nchw_kcyx_nkhw: 0.007197, workspce_sz = 0
MIOpen(HIP): Info [FindConvBwdDataAlgorithm] miopenConvolutionBwdDataAlgoImplicitGEMM   0.007197        0
MIOpen(HIP): Info [FindConvBwdDataAlgorithm] BWD Chosen Algorithm: ConvHipImplicitGemmBwdDataV4R1Xdlops , 0, 0.007197
MIOpen(HIP): Info [ConvolutionBackwardData] algo = 5, workspace = 0
MIOpen(HIP): Info [GetBackwardSolutions] 
MIOpen(HIP): Info [Measure] Db::Prefetch time: 67.6511 ms
FAILED: 0.105749
Max diff: 80
Mismatch at 14: 3 != 32
Backward convolution: ConvHipImplicitGemmBwdDataV4R1Xdlops
Input tensor: 8, 16, 14, 14
Weights tensor: 32, 16, 1, 1
Output tensor: 8, 32, 16, 16
Filter: conv2d, miopenConvolution, miopenPaddingDefault, {1, 1}, {1, 1}, {1, 1},

@carlushuang
Copy link
Contributor

This fail also happen if using fp32 (in above description it is fp16)
MIOPEN_FIND_MODE=normal MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmBwdDataV4R1Xdlops MIOPEN_LOG_LEVEL=5 ./bin/test_conv2d --input 8 16 14 14 --weights 32 16 1 1 -v --disable-backward-weights --disable-forward --pads_strides_dilations 1 1 1 1 1 1

this is a 1x1 kernel with padding. If change padding to 0, then everything is OK
cc @asroy @asleepzzz @ltqin

@atamazov
Copy link
Contributor

@carlushuang What is ROCm version (or link to docker please)?

@atamazov
Copy link
Contributor

@shurale-nkn Please fix the topmost comment. The reproduce instruction says MIOPEN_FIND_MODE=normal but log shows MIOpen(HIP): Info [GetFindModeValueImpl] MIOPEN_FIND_MODE = HYBRID(3). Or this is not important?

@shurale-nkn
Copy link
Contributor Author

@carlushuang What is ROCm version (or link to docker please)?

@atamazov
3.7

MIOPEN_FIND_MODE=normal MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmBwdDataV4R1Xdlops MI
OPEN_LOG_LEVEL=4 ./bin/test_conv2d  --input 8 16 14 14 --weights 32 16 1 1 -v --disable-backward-weights --disable-forward --pads_strides_dilations 1 1 1 1 1 1
./bin/test_conv2d --float --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-weights --input 8, 16, 14, 14 --weights 32, 16, 1, 1 --pads_strides_dilations 1 1 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW 
FAILED: 0.0782743
Max diff: 1214
Mismatch at 14: -217 != 4
Backward convolution: ConvHipImplicitGemmBwdDataV4R1Xdlops
Input tensor: 8, 16, 14, 14
Weights tensor: 32, 16, 1, 1
Output tensor: 8, 32, 16, 16
Filter: conv2d, miopenConvolution, miopenPaddingDefault, {1, 1}, {1, 1}, {1, 1},

@atamazov
Copy link
Contributor

@shurale-nkn @carlushuang So this ticket is about verification errors of ConvHipImplicitGemmBwdDataV4R1Xdlops, right?

@shurale-nkn
Copy link
Contributor Author

shurale-nkn commented May 11, 2021

Yes

@shurale-nkn

This comment has been minimized.

@carlushuang
Copy link
Contributor

carlushuang commented May 12, 2021

@atamazov
I tested on both rocm-3.10(rocm/miopen:miopen-rocm-310) and rocm-4.1(rocm/tensorflow:rocm4.1-tf2.4-dev), all fails. And both fp16/fp32 will fail.

So this ticket is about verification errors of ConvHipImplicitGemmBwdDataV4R1Xdlops, right?

Yes

@atamazov
Copy link
Contributor

The quick W/A is disabling ConvHipImplicitGemmBwdDataV4R1Xdlops for all non-zero padding configs. But let's find the answer for the following question first:

...is this reproducible with the most recent ROCm 4.2 RC?

Then we can create a W/A, lower the urgency of the issue and think about a full-blown fix in a more relaxed atmosphere. Agreed?

/cc @junliume

@shurale-nkn
Copy link
Contributor Author

@atamazov

The quick W/A is disabling ConvHipImplicitGemmBwdDataV4R1Xdlops for all non-zero padding configs. But let's find the answer for the following question first:

...is this reproducible with the most recent ROCm 4.2 RC?

Then we can create a W/A, lower the urgency of the issue and think about a full-blown fix in a more relaxed atmosphere. Agreed?

/cc @junliume

Tested on rocm-rel-4.2:16-STG2:
test failed

@asleepzzz
Copy link
Contributor

asleepzzz commented May 13, 2021

hw 14x14 yx 1*1 pad1 =>fail
hw 14x14 yx 3*3 pad1=>pass
hw 15x15 yx 1*1 pad1=>pass
hw 16x16 yx 1*1 pad1=>pass

it related to
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/3c5aea01701b7153bba6517d4655bb884a3ae6bf/src/solver/conv_hip_implicit_gemm_bwd_v4r1_xdlops.cpp#L183

we should add condition :(if pad >0 and 1x1) ,
solver can't just use gcd(SrcDataPerRead_GemmN, ho * wo)

    if(y == 1 && x == 1)
    {
        const auto ho        = ConvolutionContextInterpreter::GetOutputHeightHo(ctx);
        const auto wo        = ConvolutionContextInterpreter::GetOutputWidthWo(ctx);
        SrcDataPerRead_GemmN = gcd(SrcDataPerRead_GemmN, ho * wo);
    }
    else
    {
        SrcDataPerRead_GemmN = 1;
    }

the correct formula may become:

    if(y == 1 && x == 1 && padh==0 && padw==0)
    {
        const auto ho        = ConvolutionContextInterpreter::GetOutputHeightHo(ctx);
        const auto wo        = ConvolutionContextInterpreter::GetOutputWidthWo(ctx);
        SrcDataPerRead_GemmN = gcd(SrcDataPerRead_GemmN, ho * wo);
    }
    else
    {
        SrcDataPerRead_GemmN = 1;
    }

and issue fixed

@atamazov
Copy link
Contributor

@shaojiewang The faulty Solver has been introduced in #167. Could you please provide the fix ASAP?

@atamazov
Copy link
Contributor

via #933

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants