Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Use use_hip_kernels and use_asm_kernels in Naive convolution Solvers #1323

Merged
merged 13 commits into from
Feb 4, 2022

Conversation

carlushuang
Copy link
Contributor

this fix #1319

@carlushuang carlushuang requested review from atamazov and asroy December 3, 2021 08:03
@codecov

This comment has been minimized.

@carlushuang carlushuang requested a review from atamazov December 6, 2021 11:58
@carlushuang carlushuang requested a review from atamazov December 8, 2021 02:57
Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlushuang
Copy link
Contributor Author

@atamazov fixed in 4b25546

@carlushuang carlushuang requested a review from atamazov December 13, 2021 06:20
atamazov
atamazov previously approved these changes Dec 13, 2021
Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@junliume
Copy link
Contributor

@JehandadKhan test_find_db somehow failed for this PR, no obvious errors in the log except [GetKernels] 0 kernels for key: miopenIm2d2Col "c192i28_28w5_5p1_1s1_1d1_1t1" I tried to see if it is related to specific nodes but now it is failing on both ixt-sjc2-11 and ixt-sjc2-17.
CC: @atamazov

@junliume
Copy link
Contributor

@JehandadKhan test_find_db somehow failed for this PR, no obvious errors in the log except [GetKernels] 0 kernels for key: miopenIm2d2Col "c192i28_28w5_5p1_1s1_1d1_1t1" I tried to see if it is related to specific nodes but now it is failing on both ixt-sjc2-11 and ixt-sjc2-17. CC: @atamazov

@carlushuang @atamazov this test is passing on develop branch and consistently fails with this PR

@atamazov
Copy link
Contributor

atamazov commented Dec 13, 2021

Yeah, that is possible since OCL and HIP kernels are disabled...

@atamazov
Copy link
Contributor

...because this PR disables the rest of HIP solvers (Naive ones) that remain enabled even after #1329 because of #1319. But find-db test needs some of HIP solvers enabled. I'll propose the fix in a minute.

@atamazov
Copy link
Contributor

Just after this:
https://github.com/ROCmSoftwarePlatform/MIOpen/blob/53f34c75c4bdf77b52fb75646f14b49d25fea06f/test/find_db.cpp#L231
add

#if WORKAROUND_SWDEV_292187
    setenv("MIOPEN_DEBUG_HIP_KERNELS", "1", 1);
    setenv("MIOPEN_DEBUG_OPENCL_CONVOLUTIONS", "1", 1);
#endif

You may also need to #include <miopen/target_properties.hpp> for the definition of WORKAROUND_SWDEV_292187.

@atamazov
Copy link
Contributor

atamazov commented Dec 13, 2021

⚠️ This PR is expected to be NFC. But after merging #1329 (that adds the "MIOPEN_DEBUG_HIP_KERNELS=0" workaround for Navi21) this is not so.

Now the side effect of this PR is that it disables Naive Solvers (in some cases at least), and we may encounter a number of unexpected issues later (the test_find_db is just an early sign).

It seems like we need to replace the "MIOPEN_DEBUG_HIP_KERNELS=0" workaround from #1329 with disabling HIP solvers on individual basis (and this way keep Naive Solvers enabled). I recommend implementing this right here.

After that, the W/A in test_find_db won't be needed.

@atamazov
Copy link
Contributor

atamazov commented Dec 13, 2021

Actually it may be even better to put this PR on hold. I am going to investigate SWDEV-292187, narrow the existing W/A (from #1329) and create a new PR. At the very beginning of that work, I'll need HIP solvers individually disabled (instead of global switch). This is the same change as I just proposed to do in this PR. So let's save efforts, and wait several days.

Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Block merging due to #1323 (comment)

@atamazov atamazov changed the title add use_hip_kernels in naive conv solver Fix: Use use_hip_kernels and use_asm_kernels in Naive convolution Solvers Dec 14, 2021
@atamazov
Copy link
Contributor

#1353 will unblock this. After merging it and updating this branch from develop, 15d695b and 9b8a200 will not be necessary and should be reverted.

@atamazov
Copy link
Contributor

atamazov commented Jan 31, 2022

Now blocked only by #1398 + update from develop.

Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@atamazov atamazov removed the ON_HOLD label Feb 3, 2022
@atamazov atamazov merged commit 3d0196b into develop Feb 4, 2022
@junliume junliume deleted the fix_issue_1319 branch April 14, 2022 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HIP/ASM Naive solver ignore debug ENV
3 participants