Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HOTFIX][tests] WORKAROUND_ISSUE_2038: Disable validation of FP16 and BF16 in the smoke test of ConvHipImplicitGemmV4R1Fwd #2043

Merged
merged 2 commits into from
Mar 25, 2023

Conversation

atamazov
Copy link
Contributor

…HipImplicitGemmV4R1Fwd during its smoke test.
junliume
junliume previously approved these changes Mar 23, 2023
@junliume
Copy link
Collaborator

@atamazov @JehandadKhan @carlushuang @asroy I prefer #2041 over this because not tuning during smoke test will mask the numerical issues we eventually wish to fix. What do you think?

@junliume
Copy link
Collaborator

junliume commented Mar 24, 2023

and there is another reason to prefer #2041 over this one, since smoke_solver_ConvHipImplicitGemmV4R1Fwd is failing pretty consistently.

[2023-03-23T23:44:55.949Z]  70/324 Test #128: smoke_solver_ConvHipImplicitGemmV4R1Fwd .................................................................................................................***Failed  Error regular expression found in output. Regex=[FAILED]  6.21 sec

[2023-03-23T23:44:55.949Z] make[4]: warning: jobserver unavailable: using -j1.  Add '+' to parent make rule.

[2023-03-23T23:44:55.949Z] [  2%] Built target sqlite_memvfs

[2023-03-23T23:44:55.949Z] [  4%] Built target addkernels

[2023-03-23T23:44:55.949Z] [100%] Built target MIOpen

[2023-03-23T23:44:55.949Z] [100%] Built target test_conv2d

[2023-03-23T23:44:55.949Z] Scanning dependencies of target smoke_solver_ConvHipImplicitGemmV4R1Fwd

[2023-03-23T23:44:55.949Z] /home/jenkins/workspace/MLLibs_MIOpen_PR-2043/build/bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-backward-data --disable-backward-weights --input 256 32 27 27 --weights 128 32 1 1 --batch_size 256 --input_channels 32 --output_channels 128 --spatial_dim_elements 27 27 --filter_dims 1 1 --pads_strides_dilations 0 0 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW --deterministic 0 --tensor_vect 0 --vector_length 1 --output_type int32 --int8_vectorize 0 

[2023-03-23T23:44:55.949Z] FAILED: 0.56234

[2023-03-23T23:44:55.949Z] Max diff: 263

[2023-03-23T23:44:55.949Z] Mismatch at 0: 30 != 124

[2023-03-23T23:44:55.949Z] Forward convolution: ConvHipImplicitGemmV4R1Fwd

[2023-03-23T23:44:55.949Z] Input tensor: 256, 32, 27, 27

[2023-03-23T23:44:55.949Z] Weights tensor: 128, 32, 1, 1

[2023-03-23T23:44:55.949Z] Output tensor: 256, 128, 27, 27

[2023-03-23T23:44:55.949Z] Filter: conv2d, miopenConvolution, miopenPaddingDefault, {0, 0}, {1, 1}, {1, 1}, 

carlushuang
carlushuang previously approved these changes Mar 24, 2023
Copy link
Contributor

@carlushuang carlushuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@atamazov
Copy link
Contributor Author

@atamazov @JehandadKhan @carlushuang @asroy I prefer #2041 over this because not tuning during smoke test will mask the numerical issues we eventually wish to fix. What do you think?

We still do not know the reason of failures.

I will update both this PR and #2041. Then you'll be able to select the one which better suits our needs.

@atamazov atamazov dismissed stale reviews from carlushuang and junliume via 0b71b66 March 24, 2023 11:52
@atamazov
Copy link
Contributor Author

@junliume @JehandadKhan @carlushuang @asroy PR is ready for the next round of reviewing and CI testing.

@atamazov atamazov changed the title [HOTFIX][tests] WORKAROUND_ISSUE_2038: Do not use tuning for ConvHipImplicitGemmV4R1Fwd during its smoke test. [HOTFIX][tests] WORKAROUND_ISSUE_2038: Disable validation of FP16 and BF16 in the smoke test of ConvHipImplicitGemmV4R1Fwd Mar 24, 2023
@junliume
Copy link
Collaborator

@atamazov @JehandadKhan let's merge this PR (test only) and making #2041 standing by in case of further actions needed to disable this solver more completely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants