Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[COMGR][debug][test_gpu_reference_kernel] compiler errors #898

Closed
shurale-nkn opened this issue Apr 30, 2021 · 10 comments · Fixed by #1478
Closed

[COMGR][debug][test_gpu_reference_kernel] compiler errors #898

shurale-nkn opened this issue Apr 30, 2021 · 10 comments · Fixed by #1478

Comments

@shurale-nkn
Copy link
Contributor

shurale-nkn commented Apr 30, 2021

Many errors from compiler during the test_gpu_reference_kernel execution in COMGR build.

ROCm:4.0
gfx906
Runtime:Hip

Not reproduced in CI docker Rocm 3.7.
Reproduced at Rocm 4.2

$ CXX=/opt/rocm/llvm/bin/clang++ CXXFLAGS=-Werror cmake -DMIOPEN_TEST_FLAGS=' --disable-verification-cache --verbose' -DCMAKE_BUILD_TYPE=debug -DCMAKE_CXX_FLAGS_DEBUG='-g -fno-omit-frame-pointer -fsanitize=undefined -fno-sanitize-recover=undefined' -DBUILD_DEV=On -DMIOPEN_GPU_SYNC=Off -DMIOPEN_USE_COMGR=On ../MLOpen
$ ./bin/test_gpu_reference_kernel
....
MIOpen(HIP): Info [ConvolutionWrwImmediate] solver_id = ConvDirectNaiveConvWrw, workspace = 0
MIOpen(HIP): Info [FindSolutionImpl] ConvDirectNaiveConvWrw (not searchable)
n:2, c:8, di:8, hi:14, wi:14, k:3, do:4, ho:7, wo:6, fz:3, fy:3,fx:3, pz:1, py:1, px:1, sz:2, sy:2, sx:2, dz:1, dy:1, dx:2, g:1, dir:wrw, type:bf16, layout:NCDHW, valid:1
MIOpen(HIP): Info [get_device_name] Raw device name: gfx906
MIOpen(HIP): Info [Handle] stream: 0x737e0d0, device_id: 0
MIOpen(HIP): Info [ConvolutionForwardImmediate] solver_id = ConvDirectNaiveConvFwd, workspace = 0
MIOpen(HIP): Info [FindSolutionImpl] ConvDirectNaiveConvFwd (not searchable)
MIOpen(HIP): Error [Do] 'amd_comgr_do_action(kind, handle, in.GetHandle(), out.GetHandle())' AMD_COMGR_ACTION_COMPILE_SOURCE_TO_BC: ERROR (1)
MIOpen(HIP): Error [BuildHip] comgr status = ERROR (1)
MIOpen(HIP): Warning [BuildHip] In file included from /tmp/comgr-221397/input/naive_conv.cpp:26:
In file included from /opt/rocm/hip/include/hip/hip_fp16.h:29:
In file included from /opt/rocm/hip/include/hip/hcc_detail/hip_fp16.h:32:
In file included from /usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/algorithm:60:
In file included from /usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/utility:69:
/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_relops.h:87:7: error: redefinition of 'operator!='
      operator!=(const _Tp& __x, const _Tp& __y)
      ^
/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_relops.h:87:7: note: previous definition is here
      operator!=(const _Tp& __x, const _Tp& __y)
      ^
In file included from /tmp/comgr-221397/input/naive_conv.cpp:26:
In file included from /opt/rocm/hip/include/hip/hip_fp16.h:29:
In file included from /opt/rocm/hip/include/hip/hcc_detail/hip_fp16.h:32:
In file included from /usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/algorithm:60:
In file included from /usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/utility:69:
/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_relops.h:100:7: error: redefinition of 'operator>'
      operator>(const _Tp& __x, const _Tp& __y)
      ^
/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_relops.h:100:7: note: previous definition is here
      operator>(const _Tp& __x, const _Tp& __y)
      ^
In file included from /tmp/comgr-221397/input/naive_conv.cpp:26:
In file included from /opt/rocm/hip/include/hip/hip_fp16.h:29:
In file included from /opt/rocm/hip/include/hip/hcc_detail/hip_fp16.h:32:
In file included from /usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/algorithm:60:
In file included from /usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/utility:69:
/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_relops.h:113:7: error: redefinition of 'operator<='
      operator<=(const _Tp& __x, const _Tp& __y)
      ^
/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_relops.h:113:7: note: previous definition is here
      operator<=(const _Tp& __x, const _Tp& __y)
      ^
In file included from /tmp/comgr-221397/input/naive_conv.cpp:26:
In file included from /opt/rocm/hip/include/hip/hip_fp16.h:29:
In file included from /opt/rocm/hip/include/hip/hcc_detail/hip_fp16.h:32:
In file included from /usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/algorithm:60:
In file included from /usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/utility:69:
/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_relops.h:126:7: error: redefinition of 'operator>='
      operator>=(const _Tp& __x, const _Tp& __y)
...
@atamazov
Copy link
Contributor

atamazov commented Jun 7, 2021

@shurale-nkn Please attach full logs.

Please also try export MIOPEN_DEBUG_COMGR_HIP_PCH_ENFORCE=0 and let me know if it helps. Thanks.

@atamazov
Copy link
Contributor

atamazov commented Jun 7, 2021

@shurale-nkn Logs not needed.

@atamazov
Copy link
Contributor

atamazov commented Jun 7, 2021

@shurale-nkn

Please also try export MIOPEN_DEBUG_COMGR_HIP_PCH_ENFORCE=0 and let me know if it helps. Thanks.

Replay job launched (#5), let's see the results.

@shurale-nkn
Copy link
Contributor Author

shurale-nkn commented Jun 8, 2021

@shurale-nkn Please attach full logs.

Please also try export MIOPEN_DEBUG_COMGR_HIP_PCH_ENFORCE=0 and let me know if it helps. Thanks.

This solves the problem.

@atamazov
Copy link
Contributor

atamazov commented Jun 8, 2021

PCH functionality seems at least partially broken in ROCm 4.2 release.

@shurale-nkn to provide W/A for this specific test.

@atamazov

/cc @junliume

@atamazov
Copy link
Contributor

Most likely we'll switch to from COMGR to HIPRTC for online HIP builds.

@atamazov
Copy link
Contributor

@shurale-nkn

@shurale-nkn to provide W/A for this specific test.

Can you please remind me what is the status of this? W/A is included in some of your PRs or...?

@shurale-nkn
Copy link
Contributor Author

shurale-nkn commented Jun 22, 2021

MIOPEN_DEBUG_COMGR_HIP_PCH_ENFORCE=0 in #970

atamazov pushed a commit that referenced this issue Jul 6, 2021
- Limit the number of combinations for a single dimension for pooling and convolution tests
- Resolves "[PR Testing] Get rid of test redundancy" #816
- Resolves "[COMGR] Code quality: reference binding to null pointer of type 'char'" #877
- Tests: Test generator now includes a batch greater than 1 and able to variate count of tests using --limit
- Tests: Various improvements in tests/CMakeLists. Fixed LONG_TESTS, added information about skipped tests.
- CI: Refactored Jenkinsfile, reshuffled test stage sequence
- Added W/A: "[COMGR][debug][test_gpu_reference_kernel] compiler errors" #898
- Added W/A: "[iGemmfwd][test_conv2d][gfx906][half] Verification failed" #936
atamazov pushed a commit that referenced this issue Jul 22, 2021
- Limit the number of combinations for a single dimension for pooling and convolution tests
- Resolves "[PR Testing] Get rid of test redundancy" #816
- Resolves "[COMGR] Code quality: reference binding to null pointer of type 'char'" #877
- Tests: Test generator now includes a batch greater than 1 and able to variate count of tests using --limit
- Tests: Various improvements in tests/CMakeLists. Fixed LONG_TESTS, added information about skipped tests.
- CI: Refactored Jenkinsfile, reshuffled test stage sequence
- Added W/A: "[COMGR][debug][test_gpu_reference_kernel] compiler errors" #898
- Added W/A: "[iGemmfwd][test_conv2d][gfx906][half] Verification failed" #936
@atamazov
Copy link
Contributor

The library now detects PCH state correctly. MIOPEN_DEBUG_COMGR_HIP_PCH_ENFORCE can be removed as well as WORKAROUND_ISSUE_898.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants