Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[iGemmfwd][test_conv2d][gfx906][half] Verification failed #936

Open
shurale-nkn opened this issue May 18, 2021 · 11 comments
Open

[iGemmfwd][test_conv2d][gfx906][half] Verification failed #936

shurale-nkn opened this issue May 18, 2021 · 11 comments

Comments

@shurale-nkn
Copy link
Contributor

shurale-nkn commented May 18, 2021

same problem as in #917, but with kernel ConvHipImplicitGemmV4R1Fwd
test_conv2d with fp16 convolution failed
GPU: gfx906
Branch: develop (3e470a5)
Tested on rocm 3.7 and rocm 4.2

CXX=/opt/rocm/llvm/bin/clang++ CXXFLAGS='-Werror' cmake -DMIOPEN_TEST_FLAGS=' --disable-verification-cache --verbose' -DCMAKE_BUILD_TYPE=release  -DBUILD_DEV=Off -DCMAKE_INSTALL_PREFIX='../install' -DMIOPEN_TEST_ALL=On -DMIOPEN_TEST_HALF=On  ../

make -j 31 install test_conv2d

MIOPEN_FIND_MODE=normal MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmV4R1Fwd  MIOPEN_CONV_PRECISE_ROCBLAS_TIMING=0 MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_HIP_BWD_V4R1_XDLOPS=0 ./bin/test_conv2d --half --cmode conv --input 8 32 28 28 --weights 64 32 1 7 --pads_strides_dilations 1 1 1 1 1 1 --trans_output_pads 0 0 -v --disable-verification-cache --disable-backward-weights --disable-backward-data
./bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-backward-data --disable-backward-weights --input 8, 32, 28, 28 --weights 64, 32, 1, 7 --pads_strides_dilations 1 1 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW 
FAILED: 0.0819018
Max diff: 114
Mismatch at 25: -81 != -43
Forward convolution: ConvHipImplicitGemmV4R1Fwd
Input tensor: 8, 32, 28, 28
Weights tensor: 64, 32, 1, 7
Output tensor: 8, 64, 30, 24
Filter: conv2d, miopenConvolution, miopenPaddingDefault, {1, 1}, {1, 1}, {1, 1},
CONSOLE LOGS_LEVEL=6
MIOpen(HIP): Info [GetFindModeValueImpl] MIOPEN_FIND_MODE = NORMAL(1)
MIOpen(HIP): Info [get_device_name] Raw device name: gfx906
MIOpen(HIP): Info [Handle] stream: 0, device_id: 0
MIOpen(HIP): Info2 [HipCompilerVersionImpl] Read version information from HIP package...
MIOpen(HIP): Info [HipCompilerVersionImpl] 3.7.20315
MIOpen(HIP): Info [AmdRocmMetadataVersionDetect] ROCm MD version AMDHSA_COv3, MIOpen version 2.12.0.8453-816b0b986
MIOpen(HIP): Info2 [ValidateGcnAssemblerImpl] Running: '/opt/rocm/llvm/bin/clang --version'
MIOpen(HIP): Info2 [ValidateGcnAssemblerImpl] clang version 11.0.0 (/src/external/llvm-project/clang ee4e4ebbadcc8ea14ce99e34ed31ab31e94827ac)
MIOpen(HIP): Info2 [ValidateGcnAssemblerImpl] Target: x86_64-unknown-linux-gnu
MIOpen(HIP): Info2 [ValidateGcnAssemblerImpl] Thread model: posix
MIOpen(HIP): Info2 [ValidateGcnAssemblerImpl] InstalledDir: /opt/rocm/llvm/bin
MIOpen(HIP): Info2 [ValidateGcnAssemblerImpl] 
MIOpen(HIP): Info [BackwardDataGetWorkSpaceSize] 
MIOpen(HIP): Info [GetEnvFindOnlySolverImpl] 26
MIOpen(HIP): Info2 [GetWorkspaceSize] ConvHipImplicitGemmV4R1Fwd: Not applicable
MIOpen(HIP): Info2 [BackwardDataGetWorkSpaceSize] 0
MIOpen(HIP): Info [ForwardGetWorkSpaceSize] 
MIOpen(HIP): Info2 [GetWorkspaceSize] ConvHipImplicitGemmV4R1Fwd: 0
MIOpen(HIP): Info2 [ForwardGetWorkSpaceSize] 0
MIOpen(HIP): Info [BackwardWeightsGetWorkSpaceSize] 
MIOpen(HIP): Info2 [BackwardWeightsGetWorkSpaceSize] 0
./bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-backward-data --disable-backward-weights --input 8, 32, 28, 28 --weights 64, 32, 1, 7 --pads_strides_dilations 1 1 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW 
MIOpen(HIP): Info [ForwardGetWorkSpaceSize] 
MIOpen(HIP): Info2 [GetWorkspaceSize] ConvHipImplicitGemmV4R1Fwd: 0
MIOpen(HIP): Info2 [ForwardGetWorkSpaceSize] 0
MIOpen(HIP): Info [FindConvFwdAlgorithm] requestAlgoCount = 1, workspace = 0
MIOpen(HIP): Info2 [FindRecordUnsafe] Looking for key 32-28-28-1x7-64-30-24-8-1x1-1x1-1x1-0-NCHW-FP16-F in file /root/.config/miopen//gfx906_60.HIP.2_12_0_8453-816b0b986.ufdb.txt
MIOpen(HIP): Info2 [FindRecordUnsafe] Key match: 32-28-28-1x7-64-30-24-8-1x1-1x1-1x1-0-NCHW-FP16-F
MIOpen(HIP): Info2 [FindRecordUnsafe] Contents found: miopenConvolutionFwdAlgoImplicitGEMM:ConvHipImplicitGemmV4R1Fwd,0.11232,0,miopenConvolutionFwdAlgoImplicitGEMM,<unused>
MIOpen(HIP): Info2 [Measure] Db::FindRecord time: 0.115818 ms
MIOpen(HIP): Info2 [GetInvoker] Returning an invoker for problem 32x28x28x1x7x64x30x24x8xNCHWxFP16x1x1x1x1x1x1x1xF and solver ConvHipImplicitGemmV4R1Fwd
MIOpen(HIP): Info2 [LogFindDbItem] Kernel cache entry not found for solver <miopenConvolutionFwdAlgoImplicitGEMM::ConvHipImplicitGemmV4R1Fwd> at network config: 32-28-28-1x7-64-30-24-8-1x1-1x1-1x1-0-NCHW-FP16-F and kernel cache key: miopenConvolutionFwdAlgoImplicitGEMM, <unused>
MIOpen(HIP): Info2 [LogFindDbItem] Find-db record content: <miopenConvolutionFwdAlgoImplicitGEMM::ConvHipImplicitGemmV4R1Fwd> at network config: <unused> and algorithm name: miopenConvolutionFwdAlgoImplicitGEMM
MIOpen(HIP): Info [TryLoad] Find-db regenerating.
MIOpen(HIP): Info2 [SQLiteBase] Initializing system database file /docker_run/install/miopen/share/miopen/db/miopen.db
MIOpen(HIP): Info2 [SQLiteBase] Initializing user database file /root/.config/miopen/miopen_1.0.0.udb
MIOpen(HIP): Info [FindSolutionImpl] ConvHipImplicitGemmV4R1Fwd
MIOpen(HIP): Info2 [Prepare] SELECT solver, params FROM perf_db INNER JOIN config ON perf_db.config = config.id WHERE ( (layout = ? ) AND (data_type = ? ) AND (direction = ? ) AND (spatial_dim = ? ) AND (in_channels = ? ) AND (in_h = ? ) AND (in_w = ? ) AND (in_d = ? ) AND (fil_h = ? ) AND (fil_w = ? ) AND (fil_d = ? ) AND (out_channels = ? ) AND (batchsize = ? ) AND (pad_h = ? ) AND (pad_w = ? ) AND (pad_d = ? ) AND (conv_stride_h = ? ) AND (conv_stride_w = ? ) AND (conv_stride_d = ? ) AND (dilation_h = ? ) AND (dilation_w = ? ) AND (dilation_d = ? ) AND (bias = ? ) AND (group_count = ? ) )AND (arch = 'gfx906' ) AND (num_cu = '60');
MIOpen(HIP): Info2 [impl] [NCHW,FP16,F,2,32,28,28,1,1,7,1,64,8,1,1,0,1,1,0,1,1,0,0,1]
MIOpen(HIP): Info2 [Prepare] SELECT solver, params FROM perf_db INNER JOIN config ON perf_db.config = config.id WHERE ( (layout = ? ) AND (data_type = ? ) AND (direction = ? ) AND (spatial_dim = ? ) AND (in_channels = ? ) AND (in_h = ? ) AND (in_w = ? ) AND (in_d = ? ) AND (fil_h = ? ) AND (fil_w = ? ) AND (fil_d = ? ) AND (out_channels = ? ) AND (batchsize = ? ) AND (pad_h = ? ) AND (pad_w = ? ) AND (pad_d = ? ) AND (conv_stride_h = ? ) AND (conv_stride_w = ? ) AND (conv_stride_d = ? ) AND (dilation_h = ? ) AND (dilation_w = ? ) AND (dilation_d = ? ) AND (bias = ? ) AND (group_count = ? ) )AND (arch = 'gfx906' ) AND (num_cu = '60');
MIOpen(HIP): Info2 [impl] [NCHW,FP16,F,2,32,28,28,1,1,7,1,64,8,1,1,0,1,1,0,1,1,0,0,1]
MIOpen(HIP): Info2 [Measure] Db::Load time: 0.713655 ms
MIOpen(HIP): Info [FindSolutionImpl] Perf Db: record not found for: ConvHipImplicitGemmV4R1Fwd
MIOpen(HIP): Info [HeuristicInit] 8,64,8,2,4,4,4,2,2,4,8,1,8,1,4,16
MIOpen(HIP): Info [GetPerformanceConfigBase] 8,64,8,2,4,4,4,2,2,4,8,1,8,1,4,16
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvHipImplicitGemmV4R1Fwd: Success.
MIOpen(HIP): Info2 [SQLiteBase] Initializing system database file 
MIOpen(HIP): Info [KernDb] database not present
MIOpen(HIP): Info2 [SQLiteBase] Initializing user database file /root/.cache/miopen/2.12.0.8453-816b0b986/gfx906_60.ukdb
MIOpen(HIP): Info2 [KernDb] Database created successfully
MIOpen(HIP): Info2 [LoadBinary] Loading binary for: gridwise_convolution_implicit_gemm_v4r1_nchw_kcyx_nkhw_lds_double_buffer.cpp.o; args:  -DCK_PARAM_PROBLEM_N=8 -DCK_PARAM_PROBLEM_K=64 -DCK_PARAM_PROBLEM_C=32 -DCK_PARAM_PROBLEM_HI=28 -DCK_PARAM_PROBLEM_WI=28 -DCK_PARAM_PROBLEM_HO=30 -DCK_PARAM_PROBLEM_WO=24 -DCK_PARAM_PROBLEM_Y=1 -DCK_PARAM_PROBLEM_X=7 -DCK_PARAM_PROBLEM_CONV_STRIDE_H=1 -DCK_PARAM_PROBLEM_CONV_STRIDE_W=1 -DCK_PARAM_PROBLEM_CONV_DILATION_H=1 -DCK_PARAM_PROBLEM_CONV_DILATION_W=1 -DCK_PARAM_PROBLEM_LEFT_PAD_H=1 -DCK_PARAM_PROBLEM_LEFT_PAD_W=1 -DCK_PARAM_PROBLEM_RIGHT_PAD_H=1 -DCK_PARAM_PROBLEM_RIGHT_PAD_W=1 -DCK_PARAM_PROBLEM_CONV_GROUP_COUNTS=1 -DCK_PARAM_PROBLEM_CONV_DIRECTION_FORWARD=1 -DCK_PARAM_PROBLEM_CONV_DIRECTION_BACKWARD_DATA=0 -DCK_PARAM_PROBLEM_CONV_DIRECTION_BACKWARD_WEIGHT=0 -DCK_PARAM_TUNABLE_BLOCK_SIZE=64 -DCK_PARAM_TUNABLE_B_PER_BLOCK=8 -DCK_PARAM_TUNABLE_K_PER_BLOCK=64 -DCK_PARAM_TUNABLE_E_PER_BLOCK=8 -DCK_PARAM_DEPENDENT_GRID_SIZE=90 -DCK_PARAM_GEMM_N_REPEAT=2 -DCK_PARAM_GEMM_M_PER_THREAD_SUB_C=4 -DCK_PARAM_GEMM_N_PER_THREAD_SUB_C=4 -DCK_PARAM_GEMM_M_LEVEL0_CLUSTER=4 -DCK_PARAM_GEMM_N_LEVEL0_CLUSTER=2 -DCK_PARAM_GEMM_M_LEVEL1_CLUSTER=2 -DCK_PARAM_GEMM_N_LEVEL1_CLUSTER=4 -DCK_PARAM_IN_BLOCK_COPY_CLUSTER_LENGTHS_E=8 -DCK_PARAM_IN_BLOCK_COPY_CLUSTER_LENGTHS_N1=1 -DCK_PARAM_IN_BLOCK_COPY_CLUSTER_LENGTHS_B=8 -DCK_PARAM_IN_BLOCK_COPY_CLUSTER_LENGTHS_N2=1 -DCK_PARAM_IN_BLOCK_COPY_SRC_DATA_PER_READ_B=1 -DCK_PARAM_WEI_BLOCK_COPY_CLUSTER_LENGTHS_E=4 -DCK_PARAM_WEI_BLOCK_COPY_CLUSTER_LENGTHS_K=16 -DCK_PARAM_WEI_BLOCK_COPY_SRC_DATA_PER_READ_E=2 -DCK_PARAM_EPACK_LENGTH=2 -DCK_THREADWISE_GEMM_USE_AMD_INLINE_ASM=1 -DCK_USE_AMD_INLINE_ASM=1 --std=c++14 -DCK_USE_AMD_BUFFER_ATOMIC_FADD=0 -DCK_BLOCK_SYNC_LDS_WITHOUT_SYNC_VMEM=1 -DCK_WORKAROUND_SWDEV_229564=1 -DCK_WORKAROUND_SWDEV_231101=1 -DCK_USE_AMD_BUFFER_ADDRESSING=1 -DCK_USE_AMD_V_FMAC_F32=0 -DMIOPEN_USE_FP16=1 -DMIOPEN_USE_FP32=0 -DMIOPEN_USE_INT8=0 -DMIOPEN_USE_INT8x4=0 -DMIOPEN_USE_BFP16=0 -DMIOPEN_USE_INT32=0 -DMIOPEN_USE_RNE_BFLOAT16=1 -DCK_PARAM_IN_BLOCK_COPY_DST_DATA_PER_WRITE_EPACK=2 -DCK_PARAM_WEI_BLOCK_COPY_DST_DATA_PER_WRITE_EPACK=2 -mcpu=gfx906
MIOpen(HIP): Info2 [Prepare] SELECT kernel_blob, kernel_hash, uncompressed_size FROM kern_db WHERE (kernel_name = 'gridwise_convolution_implicit_gemm_v4r1_nchw_kcyx_nkhw_lds_double_buffer.cpp.o') AND (kernel_args = ' -DCK_PARAM_PROBLEM_N=8 -DCK_PARAM_PROBLEM_K=64 -DCK_PARAM_PROBLEM_C=32 -DCK_PARAM_PROBLEM_HI=28 -DCK_PARAM_PROBLEM_WI=28 -DCK_PARAM_PROBLEM_HO=30 -DCK_PARAM_PROBLEM_WO=24 -DCK_PARAM_PROBLEM_Y=1 -DCK_PARAM_PROBLEM_X=7 -DCK_PARAM_PROBLEM_CONV_STRIDE_H=1 -DCK_PARAM_PROBLEM_CONV_STRIDE_W=1 -DCK_PARAM_PROBLEM_CONV_DILATION_H=1 -DCK_PARAM_PROBLEM_CONV_DILATION_W=1 -DCK_PARAM_PROBLEM_LEFT_PAD_H=1 -DCK_PARAM_PROBLEM_LEFT_PAD_W=1 -DCK_PARAM_PROBLEM_RIGHT_PAD_H=1 -DCK_PARAM_PROBLEM_RIGHT_PAD_W=1 -DCK_PARAM_PROBLEM_CONV_GROUP_COUNTS=1 -DCK_PARAM_PROBLEM_CONV_DIRECTION_FORWARD=1 -DCK_PARAM_PROBLEM_CONV_DIRECTION_BACKWARD_DATA=0 -DCK_PARAM_PROBLEM_CONV_DIRECTION_BACKWARD_WEIGHT=0 -DCK_PARAM_TUNABLE_BLOCK_SIZE=64 -DCK_PARAM_TUNABLE_B_PER_BLOCK=8 -DCK_PARAM_TUNABLE_K_PER_BLOCK=64 -DCK_PARAM_TUNABLE_E_PER_BLOCK=8 -DCK_PARAM_DEPENDENT_GRID_SIZE=90 -DCK_PARAM_GEMM_N_REPEAT=2 -DCK_PARAM_GEMM_M_PER_THREAD_SUB_C=4 -DCK_PARAM_GEMM_N_PER_THREAD_SUB_C=4 -DCK_PARAM_GEMM_M_LEVEL0_CLUSTER=4 -DCK_PARAM_GEMM_N_LEVEL0_CLUSTER=2 -DCK_PARAM_GEMM_M_LEVEL1_CLUSTER=2 -DCK_PARAM_GEMM_N_LEVEL1_CLUSTER=4 -DCK_PARAM_IN_BLOCK_COPY_CLUSTER_LENGTHS_E=8 -DCK_PARAM_IN_BLOCK_COPY_CLUSTER_LENGTHS_N1=1 -DCK_PARAM_IN_BLOCK_COPY_CLUSTER_LENGTHS_B=8 -DCK_PARAM_IN_BLOCK_COPY_CLUSTER_LENGTHS_N2=1 -DCK_PARAM_IN_BLOCK_COPY_SRC_DATA_PER_READ_B=1 -DCK_PARAM_WEI_BLOCK_COPY_CLUSTER_LENGTHS_E=4 -DCK_PARAM_WEI_BLOCK_COPY_CLUSTER_LENGTHS_K=16 -DCK_PARAM_WEI_BLOCK_COPY_SRC_DATA_PER_READ_E=2 -DCK_PARAM_EPACK_LENGTH=2 -DCK_THREADWISE_GEMM_USE_AMD_INLINE_ASM=1 -DCK_USE_AMD_INLINE_ASM=1 --std=c++14 -DCK_USE_AMD_BUFFER_ATOMIC_FADD=0 -DCK_BLOCK_SYNC_LDS_WITHOUT_SYNC_VMEM=1 -DCK_WORKAROUND_SWDEV_229564=1 -DCK_WORKAROUND_SWDEV_231101=1 -DCK_USE_AMD_BUFFER_ADDRESSING=1 -DCK_USE_AMD_V_FMAC_F32=0 -DMIOPEN_USE_FP16=1 -DMIOPEN_USE_FP32=0 -DMIOPEN_USE_INT8=0 -DMIOPEN_USE_INT8x4=0 -DMIOPEN_USE_BFP16=0 -DMIOPEN_USE_INT32=0 -DMIOPEN_USE_RNE_BFLOAT16=1 -DCK_PARAM_IN_BLOCK_COPY_DST_DATA_PER_WRITE_EPACK=2 -DCK_PARAM_WEI_BLOCK_COPY_DST_DATA_PER_WRITE_EPACK=2 -mcpu=gfx906');
MIOpen(HIP): Info2 [Measure] Db::FindRecord time: 2.64252 ms
MIOpen(HIP): Info2 [LoadBinary] Sucessfully loaded binary for: gridwise_convolution_implicit_gemm_v4r1_nchw_kcyx_nkhw_lds_double_buffer.cpp.o; args:  -DCK_PARAM_PROBLEM_N=8 -DCK_PARAM_PROBLEM_K=64 -DCK_PARAM_PROBLEM_C=32 -DCK_PARAM_PROBLEM_HI=28 -DCK_PARAM_PROBLEM_WI=28 -DCK_PARAM_PROBLEM_HO=30 -DCK_PARAM_PROBLEM_WO=24 -DCK_PARAM_PROBLEM_Y=1 -DCK_PARAM_PROBLEM_X=7 -DCK_PARAM_PROBLEM_CONV_STRIDE_H=1 -DCK_PARAM_PROBLEM_CONV_STRIDE_W=1 -DCK_PARAM_PROBLEM_CONV_DILATION_H=1 -DCK_PARAM_PROBLEM_CONV_DILATION_W=1 -DCK_PARAM_PROBLEM_LEFT_PAD_H=1 -DCK_PARAM_PROBLEM_LEFT_PAD_W=1 -DCK_PARAM_PROBLEM_RIGHT_PAD_H=1 -DCK_PARAM_PROBLEM_RIGHT_PAD_W=1 -DCK_PARAM_PROBLEM_CONV_GROUP_COUNTS=1 -DCK_PARAM_PROBLEM_CONV_DIRECTION_FORWARD=1 -DCK_PARAM_PROBLEM_CONV_DIRECTION_BACKWARD_DATA=0 -DCK_PARAM_PROBLEM_CONV_DIRECTION_BACKWARD_WEIGHT=0 -DCK_PARAM_TUNABLE_BLOCK_SIZE=64 -DCK_PARAM_TUNABLE_B_PER_BLOCK=8 -DCK_PARAM_TUNABLE_K_PER_BLOCK=64 -DCK_PARAM_TUNABLE_E_PER_BLOCK=8 -DCK_PARAM_DEPENDENT_GRID_SIZE=90 -DCK_PARAM_GEMM_N_REPEAT=2 -DCK_PARAM_GEMM_M_PER_THREAD_SUB_C=4 -DCK_PARAM_GEMM_N_PER_THREAD_SUB_C=4 -DCK_PARAM_GEMM_M_LEVEL0_CLUSTER=4 -DCK_PARAM_GEMM_N_LEVEL0_CLUSTER=2 -DCK_PARAM_GEMM_M_LEVEL1_CLUSTER=2 -DCK_PARAM_GEMM_N_LEVEL1_CLUSTER=4 -DCK_PARAM_IN_BLOCK_COPY_CLUSTER_LENGTHS_E=8 -DCK_PARAM_IN_BLOCK_COPY_CLUSTER_LENGTHS_N1=1 -DCK_PARAM_IN_BLOCK_COPY_CLUSTER_LENGTHS_B=8 -DCK_PARAM_IN_BLOCK_COPY_CLUSTER_LENGTHS_N2=1 -DCK_PARAM_IN_BLOCK_COPY_SRC_DATA_PER_READ_B=1 -DCK_PARAM_WEI_BLOCK_COPY_CLUSTER_LENGTHS_E=4 -DCK_PARAM_WEI_BLOCK_COPY_CLUSTER_LENGTHS_K=16 -DCK_PARAM_WEI_BLOCK_COPY_SRC_DATA_PER_READ_E=2 -DCK_PARAM_EPACK_LENGTH=2 -DCK_THREADWISE_GEMM_USE_AMD_INLINE_ASM=1 -DCK_USE_AMD_INLINE_ASM=1 --std=c++14 -DCK_USE_AMD_BUFFER_ATOMIC_FADD=0 -DCK_BLOCK_SYNC_LDS_WITHOUT_SYNC_VMEM=1 -DCK_WORKAROUND_SWDEV_229564=1 -DCK_WORKAROUND_SWDEV_231101=1 -DCK_USE_AMD_BUFFER_ADDRESSING=1 -DCK_USE_AMD_V_FMAC_F32=0 -DMIOPEN_USE_FP16=1 -DMIOPEN_USE_FP32=0 -DMIOPEN_USE_INT8=0 -DMIOPEN_USE_INT8x4=0 -DMIOPEN_USE_BFP16=0 -DMIOPEN_USE_INT32=0 -DMIOPEN_USE_RNE_BFLOAT16=1 -DCK_PARAM_IN_BLOCK_COPY_DST_DATA_PER_WRITE_EPACK=2 -DCK_PARAM_WEI_BLOCK_COPY_DST_DATA_PER_WRITE_EPACK=2 -mcpu=gfx906
MIOpen(HIP): Info2 [PrepareInvoker] Preparing kernel: gridwise_convolution_implicit_gemm_v4r1_nchw_kcyx_nkhw_lds_double_buffer
MIOpen(HIP): Info [EvaluateInvokers] ConvHipImplicitGemmV4R1Fwd: gridwise_convolution_implicit_gemm_v4r1_nchw_kcyx_nkhw_lds_double_buffer: 0.11376 < 3.40282e+38
MIOpen(HIP): Info2 [Register] Invoker registered for algorithm 32x28x28x1x7x64x30x24x8xNCHWxFP16x1x1x1x1x1x1x1xF and solver ConvHipImplicitGemmV4R1Fwd
MIOpen(HIP): Info2 [SetAsFound1_0] Solver ConvHipImplicitGemmV4R1Fwd registered as find 1.0 best for miopenConvolutionFwdAlgoImplicitGEMM in 32x28x28x1x7x64x30x24x8xNCHWxFP16x1x1x1x1x1x1x1xF
MIOpen(HIP): Info [EvaluateInvokers] Selected: ConvHipImplicitGemmV4R1Fwd: gridwise_convolution_implicit_gemm_v4r1_nchw_kcyx_nkhw_lds_double_buffer: 0.11376, workspce_sz = 0
MIOpen(HIP): Info2 [SetValues] 32-28-28-1x7-64-30-24-8-1x1-1x1-1x1-0-NCHW-FP16-F, content inserted: miopenConvolutionFwdAlgoImplicitGEMM:ConvHipImplicitGemmV4R1Fwd,0.11376,0,miopenConvolutionFwdAlgoImplicitGEMM,<unused>
MIOpen(HIP): Info2 [StoreRecordUnsafe] Storing record: 32-28-28-1x7-64-30-24-8-1x1-1x1-1x1-0-NCHW-FP16-F
MIOpen(HIP): Info2 [FindRecordUnsafe] Looking for key 32-28-28-1x7-64-30-24-8-1x1-1x1-1x1-0-NCHW-FP16-F in file /root/.config/miopen//gfx906_60.HIP.2_12_0_8453-816b0b986.ufdb.txt
MIOpen(HIP): Info2 [FindRecordUnsafe] Key match: 32-28-28-1x7-64-30-24-8-1x1-1x1-1x1-0-NCHW-FP16-F
MIOpen(HIP): Info2 [FindRecordUnsafe] Contents found: miopenConvolutionFwdAlgoImplicitGEMM:ConvHipImplicitGemmV4R1Fwd,0.11232,0,miopenConvolutionFwdAlgoImplicitGEMM,<unused>
MIOpen(HIP): Info2 [Measure] Db::StoreRecord time: 0.259499 ms
MIOpen(HIP): Info [FindConvFwdAlgorithm] miopenConvolutionFwdAlgoImplicitGEMM   0.11376 0
MIOpen(HIP): Info [FindConvFwdAlgorithm] FW Chosen Algorithm: ConvHipImplicitGemmV4R1Fwd , 0, 0.11376
MIOpen(HIP): Info [ConvolutionForward] algo = 5, workspace = 0
MIOpen(HIP): Info2 [GetInvoker] Returning an invoker for problem 32x28x28x1x7x64x30x24x8xNCHWxFP16x1x1x1x1x1x1x1xF and algorithm miopenConvolutionFwdAlgoImplicitGEMM
MIOpen(HIP): Info [GetForwardSolutions] 
MIOpen(HIP): Info [Measure] Db::Prefetch time: 46.972 ms
MIOpen(HIP): Info2 [FindRecordUnsafe] Looking for key 32-28-28-1x7-64-30-24-8-1x1-1x1-1x1-0-NCHW-FP16-F in file /root/.config/miopen//gfx906_60.HIP.2_12_0_8453-816b0b986.ufdb.txt
MIOpen(HIP): Info2 [FindRecordUnsafe] Key match: 32-28-28-1x7-64-30-24-8-1x1-1x1-1x1-0-NCHW-FP16-F
MIOpen(HIP): Info2 [FindRecordUnsafe] Contents found: miopenConvolutionFwdAlgoImplicitGEMM:ConvHipImplicitGemmV4R1Fwd,0.11376,0,miopenConvolutionFwdAlgoImplicitGEMM,<unused>
MIOpen(HIP): Info2 [Measure] Db::FindRecord time: 0.097143 ms
FAILED: 0.0819018
Max diff: 114
Mismatch at 25: -81 != -43
Forward convolution: ConvHipImplicitGemmV4R1Fwd
Input tensor: 8, 32, 28, 28
Weights tensor: 64, 32, 1, 7
Output tensor: 8, 64, 30, 24
Filter: conv2d, miopenConvolution, miopenPaddingDefault, {1, 1}, {1, 1}, {1, 1}, 
@atamazov
Copy link
Contributor

@asroy Can you please provide a fix or workaround for this ASAP? Thanks.

@atamazov
Copy link
Contributor

The solver is originated from https://github.com/AMDComputeLibraries/MLOpen/pull/2132.

@atamazov
Copy link
Contributor

atamazov commented Jun 6, 2021

Note that MIOpenDriver passes the test. Commands to reproduce:

$ MIOPEN_FIND_MODE=normal MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmV4R1Fwd \
./bin/MIOpenDriver convfp16 -x 7 -y 1 -W 28 -H 28 -c 32 -n 8 -k 64 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -g 1 -F 1 -w 1 -t 1 -i 6 -V 1
...
Forward Convolution Verifies OK on CPU reference (0.0314941)

$ MIOPEN_FIND_MODE=normal MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmV4R1Fwd \
./bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-backward-data --disable-backward-weights \
--input 8, 32, 28, 28 --weights 64, 32, 1, 7 --pads_strides_dilations 1 1 1 1 1 1 \
--trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW
...
FAILED: 0.0819018

Another config where test_conv2d fails, but MIOpenDriver passes:

$ MIOPEN_FIND_MODE=normal MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmV4R1Fwd \
./bin/MIOpenDriver convfp16 -x 3 -y 3 -W 7 -H 7 -c 32 -n 64 -k 128 -p 0 -q 0 -u 2 -v 2 -l 1 -j 1 -g 1 -F 1 -w 1 -t 1 -i 6 -V 1
...
Forward Convolution Verifies OK on CPU reference (0.0445192)

$ MIOPEN_FIND_MODE=normal MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmV4R1Fwd \
./bin/test_conv2d --half --input 64 32 7 7 --weights 128 32 3 3 \
--pads_strides_dilations 0 0 2 2 1 1 \
--verbose --disable-verification-cache --disable-backward-data --disable-backward-weights
...
FAILED: 0.147229

The reason of this behavior:
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/96d7086fa0113e2109de44f870dbae1c340b48fc/driver/conv_driver.hpp#L3207-L3210

test_conv2d should be fixed: similar patch for tolerance must be added.

@atamazov
Copy link
Contributor

atamazov commented Jun 6, 2021

I can do this altogether with https://github.com/AMDComputeLibraries/MLOpen/pull/2512

@atamazov
Copy link
Contributor

atamazov commented Jun 6, 2021

This is not a library correctness problem, but a test related issue.

@shurale-nkn
Copy link
Contributor Author

shurale-nkn commented Jun 17, 2021

affected PR #970
http://micimaster.amd.com/blue/organizations/jenkins/MLLibs%2FMIOpen/detail/jenkins-ci-rocm-4.2/21/pipeline/
./bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --input 64, 32, 28, 28 --weights 64, 32, 1, 7 --pads_strides_dilations 0 0 2 2 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW

./bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --input 64, 32, 28, 28 --weights 64, 32, 1, 7 --pads_strides_dilations 0 0 2 2 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW 
FAILED: 0.0877669
Iteration: 0
Forward convolution: ConvHipImplicitGemmV4R1Fwd
Input tensor: 64, 32, 28, 28
Weights tensor: 64, 32, 1, 7
Output tensor: 64, 64, 14, 11
Filter: conv2d, miopenConvolution, miopenPaddingDefault, {0, 0}, {2, 2}, {1, 1}, 
Max diff: 120
Mismatch at 0: 14 != 6```

@shurale-nkn
Copy link
Contributor Author

shurale-nkn commented Jun 17, 2021

Quotation

Note that MIOpenDriver passes the test. Commands to reproduce:

$ MIOPEN_FIND_MODE=normal MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmV4R1Fwd \
./bin/MIOpenDriver convfp16 -x 7 -y 1 -W 28 -H 28 -c 32 -n 8 -k 64 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -g 1 -F 1 -w 1 -t 1 -i 6 -V 1
...
Forward Convolution Verifies OK on CPU reference (0.0314941)

$ MIOPEN_FIND_MODE=normal MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmV4R1Fwd \
./bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-backward-data --disable-backward-weights \
--input 8, 32, 28, 28 --weights 64, 32, 1, 7 --pads_strides_dilations 1 1 1 1 1 1 \
--trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW
...
FAILED: 0.0819018

Another config where test_conv2d fails, but MIOpenDriver passes:

$ MIOPEN_FIND_MODE=normal MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmV4R1Fwd \
./bin/MIOpenDriver convfp16 -x 3 -y 3 -W 7 -H 7 -c 32 -n 64 -k 128 -p 0 -q 0 -u 2 -v 2 -l 1 -j 1 -g 1 -F 1 -w 1 -t 1 -i 6 -V 1
...
Forward Convolution Verifies OK on CPU reference (0.0445192)

$ MIOPEN_FIND_MODE=normal MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmV4R1Fwd \
./bin/test_conv2d --half --input 64 32 7 7 --weights 128 32 3 3 \
--pads_strides_dilations 0 0 2 2 1 1 \
--verbose --disable-verification-cache --disable-backward-data --disable-backward-weights
...
FAILED: 0.147229

The reason of this behavior:
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/96d7086fa0113e2109de44f870dbae1c340b48fc/driver/conv_driver.hpp#L3207-L3210

test_conv2d should be fixed: similar patch for tolerance must be added.


I'm afraid this is too big a deviation for such a small configuration. Other algorithms have a much smaller deviation here
FWI @Kirpich30000

shurale-nkn added a commit that referenced this issue Jun 17, 2021
@atamazov
Copy link
Contributor

@shurale-nkn

I'm afraid this is too big a deviation for such a small configuration. Other algorithms have a much smaller deviation here

Please be clear. Would you like to ask @asroy to double-check the correctness of computations performed by ConvHipImplicitGemmV4R1Fwd for the configs in question?

atamazov pushed a commit that referenced this issue Jul 6, 2021
- Limit the number of combinations for a single dimension for pooling and convolution tests
- Resolves "[PR Testing] Get rid of test redundancy" #816
- Resolves "[COMGR] Code quality: reference binding to null pointer of type 'char'" #877
- Tests: Test generator now includes a batch greater than 1 and able to variate count of tests using --limit
- Tests: Various improvements in tests/CMakeLists. Fixed LONG_TESTS, added information about skipped tests.
- CI: Refactored Jenkinsfile, reshuffled test stage sequence
- Added W/A: "[COMGR][debug][test_gpu_reference_kernel] compiler errors" #898
- Added W/A: "[iGemmfwd][test_conv2d][gfx906][half] Verification failed" #936
atamazov pushed a commit that referenced this issue Jul 22, 2021
- Limit the number of combinations for a single dimension for pooling and convolution tests
- Resolves "[PR Testing] Get rid of test redundancy" #816
- Resolves "[COMGR] Code quality: reference binding to null pointer of type 'char'" #877
- Tests: Test generator now includes a batch greater than 1 and able to variate count of tests using --limit
- Tests: Various improvements in tests/CMakeLists. Fixed LONG_TESTS, added information about skipped tests.
- CI: Refactored Jenkinsfile, reshuffled test stage sequence
- Added W/A: "[COMGR][debug][test_gpu_reference_kernel] compiler errors" #898
- Added W/A: "[iGemmfwd][test_conv2d][gfx906][half] Verification failed" #936
@atamazov
Copy link
Contributor

@asroy Could you please confirm that computations performed by ConvHipImplicitGemmV4R1Fwd for the configs in question are correct. Then I will fix test_conv2d.

@atamazov
Copy link
Contributor

atamazov commented Mar 3, 2023

@johnny-keker Can you please create a branch with this jenkinsfile-wa-issue-936-remove.diff.txt and try it out? PRs from forks do not apply Jenksinfile changes during CI testing, so I am unable to test this from my repo, Thanks!

@ppanchad-amd
Copy link

@shurale-nkn Please test with ROCm 6.0.2 to see if this still an issue? If not, please close issue. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants