-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CK] UB when invoker of ConvHipImplicitGemmFwdXdlops is called #1971
Comments
Good news: I've looked at all the other solvers and didn't find similar problems. |
@atamazov Thanks for finding the defect in the ck solvers and provide a quick fix and solution on this. I will make the fix in all my current ck integration tasks. Thanks. |
@atamazov One more concern, @JehandadKhan and I were doing the test locally to test the correctness of integrated ck solver. It passes the test and get the same result as the naïve convolution solver. If the ConvHipImplicitGemmFwdXdlops class does not has an instance, why the test case passes locally? Please see the following test case and result: MIOPEN_FIND_MODE=1 MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmFwdXdlops ./bin/test_conv2d --float --disable-backward-data --disable-backward-weights --verbose --input 256 128 28 28 --weights 128 128 3 3 --in_layout NHWC --fil_layout NHWC --out_layout NHWC --pads_strides_dilations 1 1 1 1 1 1 --disable-verification-cache launch_and_time_kernel: grid_dim {784, 1, 1}, block_dim {256, 1, 1} BTW, I also get the test passed when using the test case you provide above. Did I miss sth important here? Thanks. |
However I would expect a failure when built with sanitizer stuff enabled.
Maybe you've used the release build. Reproduce instructions were missing the rebuild step (fixed now). Please clean up the build directory and retry with updated instructions. |
@atamazov That make sense. Thanks for the explanation. I will rebuild the test and do the same fixes on my current too. |
By #1972 |
The issue is originated from #1906 (comment)
How to reproduce
CXX=/opt/rocm/llvm/bin/clang++ CXXFLAGS=-Werror \ cmake \ -DMIOPEN_TEST_FLAGS=--disable-verification-cache \ -DCMAKE_BUILD_TYPE=debug \ -DCMAKE_CXX_FLAGS_DEBUG="-g -fno-omit-frame-pointer -fsanitize=undefined -fno-sanitize-recover=undefined -Wno-option-ignored" \ -DBUILD_DEV=On \ -DMIOPEN_GPU_SYNC=Off \ ..
Root reason
The member function
ConvHipImplicitGemmFwdXdlops::RunCKSolution()
is called from within the Invoker. There is no alive instance of the ConvHipImplicitGemmFwdXdlops class when Invoker is used.ConvHipImplicitGemmBwdXdlops
has the same defect.How to fix
ConvHipImplicitGemmFwdXdlops{}.RunCKSolution<int8_t>(...)
[Attribution] @junliume @johnny-keker https://github.com/ROCmSoftwarePlatform/MIOpen/labels/bug https://github.com/ROCmSoftwarePlatform/MIOpen/labels/urgency_blocker
I will provide a fix very soon.
@JehandadKhan @iq136boy @averinevg FYI
The text was updated successfully, but these errors were encountered: