Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConvOclDirectFwd1x1: correctness error with "-y 1 -x 1 -n 64 -c 160 -H 73 -W 73 -k 64 -p 0 -q 0" #988

Closed
atamazov opened this issue Jun 19, 2021 · 6 comments
Assignees

Comments

@atamazov
Copy link
Contributor

atamazov commented Jun 19, 2021

Appeared during CI testing (ROCm 3.7). How to reproduce:

MIOPEN_FIND_MODE=normal MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvOclDirectFwd1x1 \
./bin/test_conv2d --float --cmode conv --pmode default --group-count 1 \
--input 64, 160, 73, 73 --weights 64, 160, 1, 1 --pads_strides_dilations 0 0 1 1 1 1 \
--trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW \
--disable-backward-data --disable-backward-weights --verbose --disable-verification-cache

Originated from: #958 (comment)

Failed CI build: http://micimaster.amd.com/blue/organizations/jenkins/MLLibs%2FMIOpen/detail/asm_igemm_nhwc_fwd_bwd/28/pipeline

@atamazov
Copy link
Contributor Author

I am unable to reproduce the issue. Let's keep is open for a while.

@ce1adon
Copy link
Contributor

ce1adon commented Jun 20, 2021

@atamazov Weird test case failures come and go and the restart button stays eternal.

@atamazov
Copy link
Contributor Author

atamazov commented Jun 20, 2021

@ce1adon This is the wisdom of our ancestors, and thou shalt follow it!

@atamazov
Copy link
Contributor Author

atamazov commented Jun 23, 2021

Another case, http://micimaster.amd.com/blue/organizations/jenkins/MLLibs%2FMIOpen/detail/add_regression_test_for_989/3/pipeline/, not reproducible locally as well (ROCm 3.7, OCL backend, Vega20):

[2021-06-23T19:21:09.326Z] ../bin/test_conv2d --float --cmode conv --pmode default --group-count 1 --input 64, 192, 35, 35 --weights 48, 192, 1, 1 --pads_strides_dilations 0 0 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW
[2021-06-23T19:21:09.326Z] FAILED: 0.0175141
[2021-06-23T19:21:09.326Z] Max diff: 1967
[2021-06-23T19:21:09.326Z] Mismatch at 658878: 1263 != 759
[2021-06-23T19:21:09.326Z] Forward convolution: ConvOclDirectFwd1x1
[2021-06-23T19:21:09.326Z] Input tensor: 64, 192, 35, 35
[2021-06-23T19:21:09.326Z] Weights tensor: 48, 192, 1, 1
[2021-06-23T19:21:09.326Z] Output tensor: 64, 48, 35, 35
[2021-06-23T19:21:09.326Z] Filter: conv2d, miopenConvolution, miopenPaddingDefault, {0, 0}, {1, 1}, {1, 1},

@ppanchad-amd
Copy link

@atamazov Is this still reproducible on ROCm 6.0.2? If not, please close. Thanks!

@atamazov
Copy link
Contributor Author

6.0 ROCm, Navi21 (export MIOPEN_DEBUG_CONV_DIRECT_OCL_FWD1X1=1 to disable WORKAROUND_SWDEV_271887): Both cases are not reproducible. Closing the issue just like our ancestors, the old wise men, did for centuries.

@atamazov atamazov closed this as not planned Won't fix, can't repro, duplicate, stale Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants