Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug][CK][Static Kernel][ConvHipImplicitGemmV4R1Fwd] Possible Overflow with Large Values of Channels #2004

Closed
junliume opened this issue Feb 28, 2023 · 5 comments · Fixed by #2005

Comments

@junliume
Copy link
Collaborator

[Minimal Reproducible Example]:

MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmV4R1Fwd ./bin/MIOpenDriver convfp16 -S ConvHipImplicitGemmV4R1Fwd  -F 1 -n 128 -g 8 -k 768 -c 768 -H 149 -W 149 -y 3 -x 3 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 -V 1 -w 1 -t 1 -i 1

and the observation:

# MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmV4R1Fwd ./bin/MIOpenDriver convfp16 -S ConvHipImplicitGemmV4R1Fwd  -F 1 -n 128 -g 8 -k 1024 -c 256 -H 149 -W 149 -y 3 -x 3 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 -V 1 -w 1 -t 1 -i 1
MIOpenDriver convfp16 -S ConvHipImplicitGemmV4R1Fwd -F 1 -n 128 -g 8 -k 1024 -c 256 -H 149 -W 149 -y 3 -x 3 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 -V 1 -w 1 -t 1 -i 1
Forward Conv solutions available: 2
- id: 91 algo: 0, time: 40.6108 ms, ws: 99574272, name: GemmFwdRest
- id: 85 algo: 1, time: 1000 ms, ws: 0, name: ConvDirectNaiveConvFwd
Warning: Solution id (26) is not reported by the library. Trying it anyway...
Memory access fault by GPU node-8 (Agent handle: 0xa2c600) on address 0x7ef593087000. Reason: Unknown.
Aborted (core dumped)

[Likely Root Cause]:
there might be an overflow somewhere in the following code

src/kernels/static_composable_kernel/src/kernel_wrapper/static_kernel_gridwise_convolution_implicit_gemm_v4r1_gnchw_gkcyx_gnkhw_lds_double_buffer.cpp

[Additional Info]:
If we reduce the values of c and k, the tests can run through:

# MIOPEN_DRIVER_USE_GPU_REFERENCE=1 MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmV4R1Fwd ./bin/MIOpenDriver convfp16 -S ConvHipImplicitGemmV4R1Fwd  -F 1 -n 128 -g 8 -k 768 -c 768 -H 149 -W 149 -y 3 -x 3 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 -V 1 -w 1 -t 1 -i 1
MIOpenDriver convfp16 -S ConvHipImplicitGemmV4R1Fwd -F 1 -n 128 -g 8 -k 768 -c 768 -H 149 -W 149 -y 3 -x 3 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 -V 1 -w 1 -t 1 -i 1
Forward Conv solutions available: 2
- id: 91 algo: 0, time: 40.6108 ms, ws: 298722816, name: GemmFwdRest
- id: 85 algo: 1, time: 1000 ms, ws: 0, name: ConvDirectNaiveConvFwd
Warning: Solution id (26) is not reported by the library. Trying it anyway...
Wall-clock Time Forward Conv. Elapsed: 327.939 ms, Auxiliary API calls: 59.4805 ms (GWSS: 0.00390625)
MIOpen Forward Conv. Algorithm: -1, Solution: 26/ConvHipImplicitGemmV4R1Fwd
GPU Kernel Time Forward Conv. Elapsed: 327.847412 ms (average)
stats: name, n, c, ho, wo, x, y, k, flopCnt, bytesRead, bytesWritten, GFLOPs, GB/s, timeMs
stats: fwd-conv3x3u1, 128, 768, 147, 147, 3, 3, 768,  3670705963008, 18446744069485838336, 4248502272, 11196, 0, 327.847412
**Forward Convolution FAILED: 0.0444141 > 0.0082**

Hence the verification failure is also concerning, but it is not crashing.
Another example:

# MIOPEN_DRIVER_USE_GPU_REFERENCE=1 MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmV4R1Fwd ./bin/MIOpenDriver convfp16 -S ConvHipImplicitGemmV4R1Fwd  -F 1 -n 128 -g 8 -k 256 -c 256 -H 149 -W 149 -y 3 -x 3 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 -V 1 -w 1 -t 1 -i 1
MIOpenDriver convfp16 -S ConvHipImplicitGemmV4R1Fwd -F 1 -n 128 -g 8 -k 256 -c 256 -H 149 -W 149 -y 3 -x 3 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 -V 1 -w 1 -t 1 -i 1
Forward Conv solutions available: 2
- id: 91 algo: 0, time: 40.6108 ms, ws: 99574272, name: GemmFwdRest
- id: 85 algo: 1, time: 1000 ms, ws: 0, name: ConvDirectNaiveConvFwd
Warning: Solution id (26) is not reported by the library. Trying it anyway...
Wall-clock Time Forward Conv. Elapsed: 36.9242 ms, Auxiliary API calls: 66.7344 ms (GWSS: 0.00976562)
MIOpen Forward Conv. Algorithm: -1, Solution: 26/ConvHipImplicitGemmV4R1Fwd
GPU Kernel Time Forward Conv. Elapsed: 36.829521 ms (average)
stats: name, n, c, ho, wo, x, y, k, flopCnt, bytesRead, bytesWritten, GFLOPs, GB/s, timeMs
stats: fwd-conv3x3u1, 128, 256, 147, 147, 3, 3, 256,  407856218112, 1455112192, 1416167424, 11074, 78, 36.829521
Forward Convolution Verifies OK on GPU reference (0.000170281)

It seems that the accuracy of the kernel reduces when the number of channel increases ?

@junliume
Copy link
Collaborator Author

junliume commented Feb 28, 2023

Took another look at this line:
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/aa087c18fdd154465a04a3f55062804fa97498bf/src/solver/conv_hip_implicit_gemm_fwd_v4r1.cpp#L62-L64

Correction: #2004 (comment)
This solver should not have been enabled on gfx90a platform and FWD direction. Now it looks like a MIOpenDriver test issue (i.e. MIOPEN_DEBUG_CONVOLUTION_ATTRIB_FP16_ALT_IMPL=1 should have always been enabled with MIOpenDriver)

@junliume
Copy link
Collaborator Author

@atamazov could you also take a look at this issue?

@carlushuang
Copy link
Contributor

#2005 <- this can limit the solve within range of 2G. After this change, the MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmV4R1Fwd ./bin/MIOpenDriver convfp16 -F 1 -n 128 -g 8 -k 768 -c 768 -H 149 -W 149 -y 3 -x 3 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 -V 1 -w 1 -t 1 -i 1
Will fail due to this solver is not applicable and env var limit only earch this solver. Note need to omit -S ConvHipImplicitGemmV4R1Fwd in the MIOpenDriver cmdline, otherwise will call this solver even if it's not applicable

@atamazov
Copy link
Contributor

atamazov commented Mar 2, 2023

@junliume

This solver should not have been enabled on gfx90a platform and FWD direction.

Why? For FWD direction the FP16_ALT attribute is not enabled by default. But the library enables FP16_ALT by default for BWD and WRW. This has been done in accordance to the requirements.

The driver has nothing to do with this. Or I am not understanding something?

@junliume
Copy link
Collaborator Author

junliume commented Mar 4, 2023

@junliume

This solver should not have been enabled on gfx90a platform and FWD direction.

Why? For FWD direction the FP16_ALT attribute is not enabled by default. But the library enables FP16_ALT by default for BWD and WRW. This has been done in accordance to the requirements.

The driver has nothing to do with this. Or I am not understanding something?

My mistake :) yes indeed this is an issue we have missed and now patched.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants