-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[iGemmfwd][test_conv2d][gfx906][half] Verification failed #936
Comments
@asroy Can you please provide a fix or workaround for this ASAP? Thanks. |
The solver is originated from https://github.com/AMDComputeLibraries/MLOpen/pull/2132. |
Note that MIOpenDriver passes the test. Commands to reproduce: $ MIOPEN_FIND_MODE=normal MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmV4R1Fwd \
./bin/MIOpenDriver convfp16 -x 7 -y 1 -W 28 -H 28 -c 32 -n 8 -k 64 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -g 1 -F 1 -w 1 -t 1 -i 6 -V 1
...
Forward Convolution Verifies OK on CPU reference (0.0314941)
$ MIOPEN_FIND_MODE=normal MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmV4R1Fwd \
./bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-backward-data --disable-backward-weights \
--input 8, 32, 28, 28 --weights 64, 32, 1, 7 --pads_strides_dilations 1 1 1 1 1 1 \
--trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW
...
FAILED: 0.0819018 Another config where test_conv2d fails, but MIOpenDriver passes: $ MIOPEN_FIND_MODE=normal MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmV4R1Fwd \
./bin/MIOpenDriver convfp16 -x 3 -y 3 -W 7 -H 7 -c 32 -n 64 -k 128 -p 0 -q 0 -u 2 -v 2 -l 1 -j 1 -g 1 -F 1 -w 1 -t 1 -i 6 -V 1
...
Forward Convolution Verifies OK on CPU reference (0.0445192)
$ MIOPEN_FIND_MODE=normal MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmV4R1Fwd \
./bin/test_conv2d --half --input 64 32 7 7 --weights 128 32 3 3 \
--pads_strides_dilations 0 0 2 2 1 1 \
--verbose --disable-verification-cache --disable-backward-data --disable-backward-weights
...
FAILED: 0.147229 The reason of this behavior:
|
I can do this altogether with https://github.com/AMDComputeLibraries/MLOpen/pull/2512 |
This is not a library correctness problem, but a test related issue. |
affected PR #970
|
Quotation
I'm afraid this is too big a deviation for such a small configuration. Other algorithms have a much smaller deviation here |
Please be clear. Would you like to ask @asroy to double-check the correctness of computations performed by |
- Limit the number of combinations for a single dimension for pooling and convolution tests - Resolves "[PR Testing] Get rid of test redundancy" #816 - Resolves "[COMGR] Code quality: reference binding to null pointer of type 'char'" #877 - Tests: Test generator now includes a batch greater than 1 and able to variate count of tests using --limit - Tests: Various improvements in tests/CMakeLists. Fixed LONG_TESTS, added information about skipped tests. - CI: Refactored Jenkinsfile, reshuffled test stage sequence - Added W/A: "[COMGR][debug][test_gpu_reference_kernel] compiler errors" #898 - Added W/A: "[iGemmfwd][test_conv2d][gfx906][half] Verification failed" #936
- Limit the number of combinations for a single dimension for pooling and convolution tests - Resolves "[PR Testing] Get rid of test redundancy" #816 - Resolves "[COMGR] Code quality: reference binding to null pointer of type 'char'" #877 - Tests: Test generator now includes a batch greater than 1 and able to variate count of tests using --limit - Tests: Various improvements in tests/CMakeLists. Fixed LONG_TESTS, added information about skipped tests. - CI: Refactored Jenkinsfile, reshuffled test stage sequence - Added W/A: "[COMGR][debug][test_gpu_reference_kernel] compiler errors" #898 - Added W/A: "[iGemmfwd][test_conv2d][gfx906][half] Verification failed" #936
@asroy Could you please confirm that computations performed by ConvHipImplicitGemmV4R1Fwd for the configs in question are correct. Then I will fix test_conv2d. |
@johnny-keker Can you please create a branch with this jenkinsfile-wa-issue-936-remove.diff.txt and try it out? PRs from forks do not apply Jenksinfile changes during CI testing, so I am unable to test this from my repo, Thanks! |
@shurale-nkn Please test with ROCm 6.0.2 to see if this still an issue? If not, please close issue. Thanks! |
same problem as in #917, but with kernel ConvHipImplicitGemmV4R1Fwd
test_conv2d with fp16 convolution failed
GPU: gfx906
Branch: develop (3e470a5)
Tested on rocm 3.7 and rocm 4.2
CONSOLE LOGS_LEVEL=6
The text was updated successfully, but these errors were encountered: