Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HOTFIX][WORKAROUND] #1207 ported to develop: W/A for issue 1206. Disable ConvHipImplicitGemmBwdDataV4R1Xdlops by default #1208

Merged
merged 1 commit into from
Oct 5, 2021

Conversation

atamazov
Copy link
Contributor

@atamazov atamazov commented Oct 4, 2021

This is W/A for #1206: #1207 ported for develop to unblock MIOpen promotion pipeline.

@codecov

This comment has been minimized.

@junliume junliume merged commit 56215d6 into develop Oct 5, 2021
@atamazov
Copy link
Contributor Author

atamazov commented Oct 5, 2021

By @junliume :

@atamazov @JehandadKhan
I am a little worried about the following failure (restarted to check if it is run to run issue):

src/ocl/convolutionocl.cpp:1211: Backward Data Convolution cannot be executed due to incorrect param

[2021-10-05T11:25:32.531Z] 63/81 Test #78: test_regression_half_mi100 .................***Failed  Error regular expression found in output. Regex=[FAILED]  5.64 sec
[2021-10-05T11:25:32.531Z] [  0%] Built target sqlite_memvfs
[2021-10-05T11:25:32.531Z] [  2%] Built target addkernels
[2021-10-05T11:25:32.531Z] [ 97%] Built target MIOpen
[2021-10-05T11:25:32.531Z] [100%] Built target test_conv2d
[2021-10-05T11:25:32.531Z] Scanning dependencies of target test_regression_half_mi100
[2021-10-05T11:25:32.531Z] /var/jenkins/workspace/MLLibs_MIOpen_wa-issue-1206/build/bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-weights --input 128 24 14 14 --weights 64 24 5 5 --batch_size 128 --input_channels 24 --output_channels 64 --spatial_dim_elements 14 14 --filter_dims 3 3 --pads_strides_dilations 2 2 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW 
[2021-10-05T11:25:32.531Z] /var/jenkins/workspace/MLLibs_MIOpen_wa-issue-1206/build/bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-weights --input 128 24 14 14 --weights 64 24 5 5 --batch_size 128 --input_channels 24 --output_channels 64 --spatial_dim_elements 14 14 --filter_dims 3 3 --pads_strides_dilations 2 2 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW 
[2021-10-05T11:25:32.531Z] FAILED: /var/jenkins/workspace/MLLibs_MIOpen_wa-issue-1206/src/ocl/convolutionocl.cpp:1211: Backward Data Convolution cannot be executed due to incorrect params

@atamazov
Copy link
Contributor Author

atamazov commented Oct 5, 2021

@junliume It may worth limiting the W/A to FP32 only. I will check and create another PR, if necessary.

@junliume
Copy link
Contributor

junliume commented Oct 5, 2021

@junliume It may worth limiting the W/A to FP32 only. I will check and create another PR, if necessary.

Yes, I think so. What does perf_db.empty() mean? The config does not have a "fall-back" kernel to run?

if(perf_db.empty())
        MIOPEN_THROW(miopenStatusUnknownError,
                     "Backward Data Convolution cannot be executed due to incorrect params");

junliume added a commit that referenced this pull request Oct 5, 2021
…citGemmBwdDataV4R1Xdlops by default. (#1207) (#1208)"

This reverts commit 56215d6.
junliume added a commit that referenced this pull request Oct 5, 2021
…citGemmBwdDataV4R1Xdlops by default. (#1207) (#1208)"

This reverts commit 56215d6.
junliume added a commit that referenced this pull request Oct 6, 2021
…ert [HOTFIX][WORKAROUND] W/A #1208. (#1209)

* remove solver ConvHipImplicitGemmBwdDataV4R1Xdlops fp32

* Revert "[HOTFIX][WORKAROUND] W/A for issue 1206. Disable ConvHipImplicitGemmBwdDataV4R1Xdlops by default. (#1207) (#1208)"

This reverts commit 56215d6.

Co-authored-by: Jun Liu <[email protected]>
@atamazov
Copy link
Contributor Author

atamazov commented Oct 6, 2021

@junliume Some context is missing, but I IIRC this is the case when Find() was unable to find any suitable convolution. The naming of perf_db is misleading; this object is a collection of found solutions to be written into find-db after Find().

@atamazov atamazov deleted the wa-issue-1206 branch December 6, 2021 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants