Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network build stage fails in clang call with segfault #1456

Closed
MaierOli2010 opened this issue Mar 8, 2022 · 14 comments · Fixed by #1521
Closed

Network build stage fails in clang call with segfault #1456

MaierOli2010 opened this issue Mar 8, 2022 · 14 comments · Fixed by #1521

Comments

@MaierOli2010
Copy link

MaierOli2010 commented Mar 8, 2022

Just stumbled over the following error when building a CNN using ROCM 5.2 with the pre-build tensorflow-rocm package.

Running on Arch Linux and build ROCm from source. Version 4.3 worked on Arch without the error below.

clang-14: error: cannot specify -o when generating multiple output files
terminate called after throwing an instance of 'miopen::Exception'
  what():  /home/omaier/Downloads/miopen-hip/src/MIOpen-rocm-4.5.2/src/tmp_dir.cpp:45: 
Can't execute cd /tmp/miopen-naive_conv.cpp-af8a-ccc4-d9d9-ca89;  
/opt/rocm/llvm/bin/clang++  -DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_INT8=0 
-DMIOPEN_USE_INT8x4=0 -DMIOPEN_USE_BFP16=0 -DMIOPEN_USE_INT32=0 -DMIOPEN_USE_RNE_BFLOAT16=1 
-mcpu=gfx1030 -Wno-everything --std=c++11 --cuda-gpu-arch=gfx1030 --cuda-device-only -c -O3  
-Wno-unused-command-line-argument -I. -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false -x hip 
-isystem /opt/rocm/hip/../include -isystem /opt/rocm/llvm/lib/clang/14.0.0/include/.. 
-D__HIP_PLATFORM_HCC__=1 -D__HIP_PLATFORM_AMD__=1 
-isystem /opt/rocm/hip/include 

-isystem /opt/rocm/include 

/opt/rocm/llvm/lib/clang/14.0.0/lib/linux/libclang_rt.builtins-x86_64.a 


-l--hip-link -l -l -l -l -l /opt/rocm/llvm/lib/clang/14.0.0/lib/linux/libclang_rt.builtins-x86_64.a 
-mllvm --amdgpu-spill-vgpr-to-agpr=0 -DHIP_PACKAGE_VERSION_FLAT=5000022102 naive_conv.cpp 
-o /tmp/miopen-naive_conv.cpp-af8a-ccc4-d9d9-ca89/naive_conv.cpp.o
[1]    927915 IOT instruction (core dumped)  python unet_keras.py

It seemes that an -isystem is missing after -isystem /opt/rocm/include infront of /opt/rocm/llvm/lib/clang/14.0.0/lib/linux/libclang_rt.builtins-x86_64.a .

Building manually in this temporary directory with the missing -isystem works without errors/warnings.

Not sure where this originates from but I thought it might be related to MIOpen, considering the file that starts the build.

Happy to provide further information if needed.

@atamazov
Copy link
Contributor

atamazov commented Mar 8, 2022

@junliume This problem originated from recent changes in the compiler and requires fixing in MIOpen. I think that @pfultz2 can help here.

@junliume
Copy link
Collaborator

junliume commented Mar 8, 2022

@pfultz2 is there any changes needed for rbuild or cmakes? Your guidance would be appreciated :)

BTW~ could we host half/rbuild/cget to https://github.com/ROCmSoftwarePlatform ?

@pfultz2
Copy link
Contributor

pfultz2 commented Mar 8, 2022

This problem originated from recent changes in the compiler and requires fixing in MIOpen.

This is most likely due to generate expressions in the flags for hip that we capture in cmake, and we need to update our parsing of cmake flags to handle generator expressions.

In MIGraphX, I updated the parsing to evaluate some of the generator expressions here. However, MIOpen fixed the issue by just removing .a files in #1264, but this doesnt really fix the issue since the order or other generator expressions can break it again. I believe @causten had made you aware of these changes in email.

I suggest removing InplaceRemoveAllTokensThatInclude and using the updates to TargetFlags.cmake from migraphx.

@junliume
Copy link
Collaborator

@JehandadKhan and @DrizztDoUrden could you take a look at this issue with Paul?

this doesnt really fix the issue since the order or other generator expressions can break it again

@junliume junliume added this to the ROCm 5.2 milestone Mar 21, 2022
@atamazov
Copy link
Contributor

@junliume I am going to look into this & fix asap.

@atamazov
Copy link
Contributor

atamazov commented Apr 4, 2022

@junliume Can you please assign this to me to make it better visible, thanks.

@junliume

This comment was marked as off-topic.

@junliume junliume pinned this issue Apr 6, 2022
@junliume
Copy link
Collaborator

junliume commented Apr 6, 2022

Issue is pined on top until fixed

@atamazov

This comment was marked as off-topic.

@kiritigowda kiritigowda unpinned this issue Apr 7, 2022
@junliume

This comment was marked as off-topic.

@atamazov
Copy link
Contributor

atamazov commented Apr 8, 2022

@junliume Sorry for delaying (a vacation week interrupted me). Let me finish it. But the most experienced engineer in this field is @pfultz2

@junliume
Copy link
Collaborator

@pfultz2 Can we use TargetFlags.cmake in MIGraphX or something else also needs to change?
CC: @atamazov

@junliume
Copy link
Collaborator

Update all ROCm 5.2 targeted PR/issues as blockers.
No PR associated with this issue yet. Can we defer it to future release? Or if the above suggestion is feasible, we may land in timeframe.

@atamazov
Copy link
Contributor

I am hoping to deliver PR tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants