Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MI100][FP32] ConvHipImplicitGemmBwdDataV4R1Xdlops verficiation failure (SWDEV-305815) #1206

Open
atamazov opened this issue Oct 4, 2021 · 7 comments

Comments

@atamazov
Copy link
Contributor

atamazov commented Oct 4, 2021

Related Jira: https://ontrack-internal.amd.com/browse/SWDEV-305815

Reproducible with 4.3.1.

Failing config:

MIOpenDriver conv -n 32 -c 256 -H 38 -W 38 -k 256 -y 1 -x 1 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -t 1 -F 2 -i 1
MIOpen Backward Data Conv. Algorithm: 5, Solution: 60/ConvHipImplicitGemmBwdDataV4R1Xdlops
GPU Kernel Time Backward Data Conv. Elapsed: 0.213032 ms (average)
stats: name, n, c, ho, wo, x, y, k, flopCnt, bytesRead, bytesWritten, GFLOPs, GB/s, timeMs
stats: bwdd-conv1x1u1, 32, 256, 1, 1, 256, 38, 38,  6056574976, 8650752, 47316992, 28430, 263, 0.213032
Backward Convolution Data Failed: 0.220731 > 1.5e-05

The root reason is precision issue of ConvHipImplicitGemmBwdDataV4R1Xdlops (or precision issue of the verification in the driver, which is unlikely IMO).

The implicit reason of the issue is the recent find-db update (#1196) which moved ConvHipImplicitGemmBwdDataV4R1Xdlops on top of the list (it is the fastest one).

Most likely the same issue persists in 4.5 release staging branch as it contains #1195 (which is expected to be identical to #1196)

System find-db reports the following:

Backward Data Conv solutions available: 4
- id: 60 algo: 5, time: 0.19488 ms, ws: 0, name: ConvHipImplicitGemmBwdDataV4R1Xdlops
- id: 96 algo: 0, time: 0.2224 ms, ws: 0, name: GemmBwd1x1_stride1
- id: 2 algo: 1, time: 0.3624 ms, ws: 0, name: ConvAsm1x1U
- id: 37 algo: 3, time: 0.862559 ms, ws: 0, name: ConvBinWinogradRxSf3x2

How to reproduce independently of find-db contents: Append -S 60 to the driver command line (so the driver will use Immediate mode with ConvHipImplicitGemmBwdDataV4R1Xdlops) or prepend the driver command with MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmBwdDataV4R1Xdlops MIOPEN_FIND_MODE=normal.

I am investigating this.

@atamazov
Copy link
Contributor Author

atamazov commented Oct 4, 2021

W/A for 4.5 release is provided. Now we need to provide W/A for develop. Explanations tomorrow.

@atamazov
Copy link
Contributor Author

atamazov commented Oct 4, 2021

Urgency shall be lowered after #1208 is merged in.

@atamazov atamazov removed this from the ROCm 4.5 milestone Oct 4, 2021
@atamazov
Copy link
Contributor Author

atamazov commented Oct 5, 2021

Link to Jira added.

@atamazov
Copy link
Contributor Author

atamazov commented Oct 5, 2021

Analysis

The precision degradation occurs with certain values of PerformanceConfig -- 9 out of 111 total. The degradation seems so high (7 orders of magnitude) that I can't suppose that the reason is rounding or order-of-computations differences between CPU and GPU.

Most likely reason is that the Solver does not comply to the #866 (comment), specifically:

  • IsValidPerformanceConfig() must return true only if the "...Result of execution would be numerically correct."
  • SetNextValue() must also return value that yields correct result on GPU.

Excerpt from the full table of PerformanceConfigs:

PerformanceConfig RMS Result RMS degradation, times
32,256,1,4,64,32,1,1 0.220731 FAILED 3475607
64,256,1,4,64,64,1,1 0.220731 FAILED 3475607
128,128,1,1,64,128,1,1 0.194936 FAILED 3069442
128,128,1,2,64,128,1,1 0.147098 FAILED 2316189
128,128,2,1,64,128,1,1 0.147098 FAILED 2316189
128,128,8,2,64,128,1,1 0.15224 FAILED 2397155
128,256,1,1,64,128,1,1 0.194936 FAILED 3069442
128,256,1,2,64,128,1,1 0.150776 FAILED 2374103
128,256,2,1,64,128,1,1 0.150776 FAILED 2374103
All other configs (102) 6.35E-08 Ok 0

Full table of PerformanceConfigs and results

[Click to view]
PerformanceConfig RMS Result RMS degradation
32,256,1,1,64,32,1,1 6.35086E-08   0
32,256,1,2,64,32,1,1 6.35086E-08   0
32,256,1,4,64,32,1,1 0.220731 FAILED 3475607
32,256,2,1,64,32,1,1 6.35086E-08   0
32,256,2,2,64,32,1,1 6.35086E-08   0
32,256,2,4,64,32,1,1 6.35086E-08   0
32,256,2,8,64,32,1,1 6.35086E-08   0
32,256,4,1,64,32,1,1 6.35086E-08   0
32,256,4,2,64,32,1,1 6.35086E-08   0
32,256,4,4,64,32,1,1 6.35086E-08   0
32,256,8,1,64,32,1,1 6.35086E-08   0
32,256,8,2,64,32,1,1 6.35086E-08   0
64,128,1,1,32,64,1,1 6.35086E-08   0
64,128,1,1,64,32,1,1 6.35086E-08   0
64,128,1,1,64,64,1,1 6.35086E-08   0
64,128,1,2,32,64,1,1 6.35086E-08   0
64,128,1,2,64,32,1,1 6.35086E-08   0
64,128,1,2,64,64,1,1 6.35086E-08   0
64,128,1,4,32,64,1,1 6.35086E-08   0
64,128,1,4,64,32,1,1 6.35086E-08   0
64,128,2,1,32,64,1,1 6.35086E-08   0
64,128,2,1,64,32,1,1 6.35086E-08   0
64,128,2,1,64,64,1,1 6.35086E-08   0
64,128,2,2,32,64,1,1 6.35086E-08   0
64,128,2,2,64,32,1,1 6.35086E-08   0
64,128,2,2,64,64,1,1 6.35086E-08   0
64,128,2,4,32,64,1,1 6.35086E-08   0
64,128,2,4,64,32,1,1 6.35086E-08   0
64,128,4,1,32,64,1,1 6.35086E-08   0
64,128,4,1,64,32,1,1 6.35086E-08   0
64,128,4,1,64,64,1,1 6.35086E-08   0
64,128,4,2,32,64,1,1 6.35086E-08   0
64,128,4,2,64,32,1,1 6.35086E-08   0
64,128,4,2,64,64,1,1 6.35086E-08   0
64,128,4,4,32,64,1,1 6.35086E-08   0
64,128,4,4,64,32,1,1 6.35086E-08   0
64,128,8,1,32,64,1,1 6.35086E-08   0
64,128,8,1,64,32,1,1 6.35086E-08   0
64,128,8,1,64,64,1,1 6.35086E-08   0
64,128,8,2,32,64,1,1 6.35086E-08   0
64,128,8,2,64,32,1,1 6.35086E-08   0
64,128,8,2,64,64,1,1 6.35086E-08   0
64,128,8,4,32,64,1,1 6.35086E-08   0
64,128,8,4,64,32,1,1 6.35086E-08   0
64,256,1,1,64,64,1,1 6.35086E-08   0
64,256,1,2,64,64,1,1 6.35086E-08   0
64,256,1,4,64,64,1,1 0.220731 FAILED 3475607
64,256,2,1,64,64,1,1 6.35086E-08   0
64,256,2,2,64,64,1,1 6.35086E-08   0
64,256,2,4,64,64,1,1 6.35086E-08   0
64,256,4,1,64,64,1,1 6.35086E-08   0
64,256,4,2,64,64,1,1 6.35086E-08   0
64,256,4,4,64,64,1,1 6.35086E-08   0
64,256,8,1,64,64,1,1 6.35086E-08   0
64,256,8,2,64,64,1,1 6.35086E-08   0
128,64,1,1,32,64,1,1 6.35086E-08   0
128,64,1,1,64,32,1,1 6.35086E-08   0
128,64,1,1,64,64,1,1 6.35086E-08   0
128,64,1,2,32,64,1,1 6.35086E-08   0
128,64,1,2,64,32,1,1 6.35086E-08   0
128,64,1,2,64,64,1,1 6.35086E-08   0
128,64,1,4,64,64,1,1 6.35086E-08   0
128,64,2,1,32,64,1,1 6.35086E-08   0
128,64,2,1,64,32,1,1 6.35086E-08   0
128,64,2,1,64,64,1,1 6.35086E-08   0
128,64,2,2,32,64,1,1 6.35086E-08   0
128,64,2,2,64,32,1,1 6.35086E-08   0
128,64,2,2,64,64,1,1 6.35086E-08   0
128,64,2,4,64,64,1,1 6.35086E-08   0
128,64,2,8,64,64,1,1 6.35086E-08   0
128,64,4,1,32,64,1,1 6.35086E-08   0
128,64,4,1,64,32,1,1 6.35086E-08   0
128,64,4,1,64,64,1,1 6.35086E-08   0
128,64,4,2,32,64,1,1 6.35086E-08   0
128,64,4,2,64,32,1,1 6.35086E-08   0
128,64,4,2,64,64,1,1 6.35086E-08   0
128,64,4,4,64,64,1,1 6.35086E-08   0
128,64,8,1,32,64,1,1 6.35086E-08   0
128,64,8,1,64,32,1,1 6.35086E-08   0
128,64,8,1,64,64,1,1 6.35086E-08   0
128,64,8,2,32,64,1,1 6.35086E-08   0
128,64,8,2,64,32,1,1 6.35086E-08   0
128,64,8,2,64,64,1,1 6.35086E-08   0
128,128,1,1,64,64,1,1 6.35086E-08   0
128,128,1,1,64,128,1,1 0.194936 FAILED 3069442
128,128,1,2,64,64,1,1 6.35086E-08   0
128,128,1,2,64,128,1,1 0.147098 FAILED 2316189
128,128,1,4,64,128,1,1 6.35086E-08   0
128,128,2,1,64,64,1,1 6.35086E-08   0
128,128,2,1,64,128,1,1 0.147098 FAILED 2316189
128,128,2,2,64,64,1,1 6.35086E-08   0
128,128,2,2,64,128,1,1 6.35086E-08   0
128,128,2,4,64,128,1,1 6.35086E-08   0
128,128,2,8,64,128,1,1 6.35086E-08   0
128,128,4,1,64,64,1,1 6.35086E-08   0
128,128,4,1,64,128,1,1 6.35086E-08   0
128,128,4,2,64,64,1,1 6.35086E-08   0
128,128,4,2,64,128,1,1 6.35086E-08   0
128,128,4,4,64,128,1,1 6.35086E-08   0
128,128,8,1,64,64,1,1 6.35086E-08   0
128,128,8,1,64,128,1,1 6.35086E-08   0
128,128,8,2,64,64,1,1 6.35086E-08   0
128,128,8,2,64,128,1,1 0.15224 FAILED 2397155
128,256,1,1,64,128,1,1 0.194936 FAILED 3069442
128,256,1,2,64,128,1,1 0.150776 FAILED 2374103
128,256,2,1,64,128,1,1 0.150776 FAILED 2374103
128,256,2,2,64,128,1,1 6.35086E-08   0
128,256,4,1,64,128,1,1 6.35086E-08   0
128,256,4,2,64,128,1,1 6.35086E-08   0
128,256,8,1,64,128,1,1 6.35086E-08   0
128,256,8,2,64,128,1,1 6.35086E-08   0

Console log and how to reproduce

issue-1206-MI100-rocm4.3.1-console-log.zip

Use the fix-igemmv4r1xdlops 07c84cd branch:

  • It is develop 442db6149 + support for MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_HIP_BWD_V4R1_XDLOPS_PERF_VALS.
  • The latter is necessary to feed the solver with arbitrary PerformanceConfigs.

@junliume

This comment has been minimized.

@atamazov
Copy link
Contributor Author

atamazov commented Oct 5, 2021

#1206 (comment) moved to #1208 (comment)

@atamazov atamazov changed the title [MI100][FP32] ConvHipImplicitGemmBwdDataV4R1Xdlops verficiation failure [MI100][FP32] ConvHipImplicitGemmBwdDataV4R1Xdlops verficiation failure (SWDEV-305815) Oct 6, 2021
atamazov added a commit that referenced this issue Oct 6, 2021
… Disable ConvHipImplicitGemmBwdDataV4R1Xdlops for FP32.
junliume pushed a commit that referenced this issue Oct 7, 2021
… Disable ConvHipImplicitGemmBwdDataV4R1Xdlops for FP32. (#1211)

* [HOTFIX][WORKAROUND][MI100][FP32] W/A for SWDEV-305815 (issue #1206). Disable ConvHipImplicitGemmBwdDataV4R1Xdlops for FP32.

* Added regression test for for SWDEV-305815 (issue #1206)
atamazov added a commit that referenced this issue Oct 8, 2021
… Disable ConvHipImplicitGemmBwdDataV4R1Xdlops for FP32. (#1211)

* [HOTFIX][WORKAROUND][MI100][FP32] W/A for SWDEV-305815 (issue #1206). Disable ConvHipImplicitGemmBwdDataV4R1Xdlops for FP32.

* Added regression test for for SWDEV-305815 (issue #1206)
@ppanchad-amd
Copy link

@atamazov SWDEV-305815 is closed. Can we close this ticket as well? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants