-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tests] Fix for DEV builds. Fix testing on 16GB GPUs. #1729
Conversation
… HIP version >= 5.3.22305 (Mainline build 10659)
…cl to avoid warning
… if GetAvailableMemory reports insufficient memory but use hipHostMalloc instead. Improve logging.
@junliume @shurale-nkn Locally tested. Description updated. Ready for review. |
@junliume If some memory-consuming tests were disabled due to instability, then we can try enabling them after this PR is merged in. |
src/kernels/MIOpenPoolingBwd.cl
Outdated
@@ -284,9 +284,8 @@ mloPoolingMaxBwd(const __global _FLOAT* top_df, | |||
|
|||
barrier(CLK_LOCAL_MEM_FENCE); | |||
_FLOAT add_val; | |||
int bt_y = (y + lcl_id1 * MLO_POOLBWD_N_VERT_OUT_PIX); | |||
int bt_x = (x + lcl_id0 * MLO_POOLBWD_N_HORIZ_OUT_PIX); | |||
int b_idx = bt_y * mlo_bot_width + bt_x; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is "b_idx" not used anywhere and redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes Jun. I looked at code. 'b_idx" is computed and it was not used inside code anywhere. It's should be removed it to avoid compilation issues.
@@ -30,7 +30,7 @@ namespace miopen { | |||
|
|||
std::string HIPErrorMessage(int error, const std::string& msg) | |||
{ | |||
return msg + " " + hipGetErrorString(static_cast<hipError_t>(error)); | |||
return msg + ": " + hipGetErrorString(static_cast<hipError_t>(error)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is ":" change mandatory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if we want to automatically separate error message from the error code, see https://github.com/ROCmSoftwarePlatform/MIOpen/pull/1729/files#r962057473.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
@@ -211,6 +211,7 @@ void test_warnings(kernel_type_t kern_type) | |||
if(kern_type == miopenOpenCLKernelType) | |||
EXPECT(throws([&] { | |||
h.AddKernel("GEMM", "", WriteNop(kern_type), "write", {1, 1, 1}, {1, 1, 1}, ""); | |||
MIOPEN_LOG_E("FAILED: Build of the OpenCL kernel should produce warnings"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is very good.
@@ -223,7 +224,8 @@ void test_warnings(kernel_type_t kern_type) | |||
"", | |||
0, | |||
false, | |||
WriteNop(miopenHIPKernelType)); | |||
WriteNop(kern_type)); | |||
MIOPEN_LOG_E("FAILED: Build of the HIP kernel 'nop_hip.cpp' should produce warnings"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. This change is good.
auto status = hipMemPtrGetInfo(mem, &size); | ||
if(status != hipSuccess) | ||
MIOPEN_LOG_W("hipMemPtrGetInfo at " << mem << " status: " << status); | ||
status = hipFree(mem); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it ok to use hipFree for device and for host memory?
MIOPEN_THROW_HIP_STATUS(status, | ||
"Hip error creating buffer " + std::to_string(sz) + ": "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@muralinr I do not think we want to append + ": "
to each MIOPEN_THROW_HIP_STATUS
.
A pack of changes that expected to fix problems with testing of DEV builds and problems with testing on 16GB GPUs (e.g. Radeon VII, MI25, Radeon RX 6800/6900). May improve CI stability.
test_lrn_test
,test_activation
,test_pooling2d
(and possibly more) with 16GB GPU cards, see [gfx1030][FP16][ROCm5.2] test_lrn_test failure due to hipRTC issue #1674 (comment). Also prevents similar problems that may happen with GPUs that have more memory onboard.@shurale-nkn @junliume @muralinr Please review. I would recommend
urgency_high
. Thanks.Local
make check
passed on Radeon VII with both FP16 and FP32.[Informative] COMMANDS TO BUILD AND RUN TESTS: