Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocm_helpers missing dependency declarations #61354

Closed
MrTreev opened this issue Jul 22, 2023 · 12 comments
Closed

rocm_helpers missing dependency declarations #61354

MrTreev opened this issue Jul 22, 2023 · 12 comments
Assignees
Labels
subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues type:build/install Build and install issues

Comments

@MrTreev
Copy link

MrTreev commented Jul 22, 2023

Issue type

Build/Install

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

master/nightly

Custom code

Yes

OS platform and distribution

Arch Linux (Linux 6.4.4-arch1-1 #1 SMP PREEMPT_DYNAMIC x86_64 GNU/Linux)

Mobile device

N/A

Python version

3.10

Bazel version

6.1.0

GCC/compiler version

gcc (GCC) 13.1.1 20230714

CUDA/cuDNN version

None

GPU model and memory

AMD Radeon RX 7900 XT

Current behavior?

After adding #include <stdint.h> to line 16 of tensorflow/tsl/lib/io/cache.cc to fix a different error, and using the installation method described in the reproduce field.

Bazel gives the error described in the attached log.

This persists through different Bazel versions, and full cleans.

I am using the following archlinux packages for ROCm:

local/opencl-amd 1:5.6.0-2
    ROCr OpenCL stack
local/opencl-amd-dev 1:5.6.0-1
    OpenCL SDK / HIP SDK / ROCM Compiler.

Standalone code to reproduce the issue

./configure
You have bazel 6.1.0 installed.
Please specify the location of python. [Default is /usr/bin/python3]:

Found possible Python library paths:
  /usr/lib/python3.11/site-packages
Please input the desired Python library path to use.  Default is [/usr/lib/python3.11/site-packages]

Do you wish to build TensorFlow with ROCm support? [y/N]: y
ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]:
No CUDA support will be enabled for TensorFlow.

Do you want to use Clang to build TensorFlow? [Y/n]:
Clang will be used to compile TensorFlow.

Please specify the path to clang executable. [Default is /usr/bin/clang]:

You have Clang 17.0.0 installed.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -Wno-sign-compare]:

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.



bazel build --config=opt --verbose_failures //tensorflow/tools/pip_package:build_pip_package

Relevant log output

ERROR: /home/user/Repos/tensorflow/tensorflow/compiler/xla/stream_executor/rocm/BUILD:527:11: Compiling tensorflow/compiler/xla/stream_executor/rocm/rocm_helpers.cu.cc [for tool] failed: undeclared inclusion(s) in rule '//tensorflow/compiler/xla/stream_executor/rocm:rocm_helpers':
this rule is missing dependency declarations for the following files included by 'tensorflow/compiler/xla/stream_executor/rocm/rocm_helpers.cu.cc':
  '/opt/rocm-5.6.0/include/hip/hip_version.h'
  '/opt/rocm-5.6.0/include/hip/hip_runtime.h'
  '/opt/rocm-5.6.0/include/hip/hip_common.h'
  '/opt/rocm-5.6.0/include/hip/amd_detail/amd_hip_runtime.h'
  '/opt/rocm-5.6.0/include/hip/amd_detail/amd_hip_common.h'
  '/opt/rocm-5.6.0/include/hip/hip_runtime_api.h'
  '/opt/rocm-5.6.0/include/hip/amd_detail/host_defines.h'
  '/opt/rocm-5.6.0/include/hip/driver_types.h'
  '/opt/rocm-5.6.0/include/hip/texture_types.h'
  '/opt/rocm-5.6.0/include/hip/channel_descriptor.h'
  '/opt/rocm-5.6.0/include/hip/amd_detail/amd_channel_descriptor.h'
  '/opt/rocm-5.6.0/include/hip/amd_detail/amd_hip_vector_types.h'
  '/opt/rocm-5.6.0/include/hip/surface_types.h'
  '/opt/rocm-5.6.0/include/hip/amd_detail/amd_hip_runtime_pt_api.h'
  '/opt/rocm-5.6.0/include/hip/amd_detail/hip_ldg.h'
  '/opt/rocm-5.6.0/include/hip/amd_detail/amd_hip_atomic.h'
  '/opt/rocm-5.6.0/include/hip/amd_detail/amd_device_functions.h'
  '/opt/rocm-5.6.0/include/hip/amd_detail/math_fwd.h'
  '/opt/rocm-5.6.0/include/hip/hip_vector_types.h'
  '/opt/rocm-5.6.0/include/hip/amd_detail/device_library_decls.h'
  '/opt/rocm-5.6.0/include/hip/amd_detail/amd_warp_functions.h'
  '/opt/rocm-5.6.0/include/hip/amd_detail/amd_hip_unsafe_atomics.h'
  '/opt/rocm-5.6.0/include/hip/amd_detail/amd_surface_functions.h'
  '/opt/rocm-5.6.0/include/hip/amd_detail/ockl_image.h'
  '/opt/rocm-5.6.0/include/hip/amd_detail/texture_fetch_functions.h'
  '/opt/rocm-5.6.0/include/hip/hip_texture_types.h'
  '/opt/rocm-5.6.0/include/hip/amd_detail/texture_indirect_functions.h'
  '/opt/rocm-5.6.0/include/hip/amd_detail/amd_math_functions.h'
  '/opt/rocm-5.6.0/include/hip/amd_detail/hip_fp16_math_fwd.h'
  '/opt/rocm-5.6.0/include/hip/library_types.h'
  '/opt/rocm-5.6.0/include/hip/hip_bfloat16.h'
  '/opt/rocm-5.6.0/include/hip/amd_detail/amd_hip_bfloat16.h'
  '/opt/rocm-5.6.0/include/hip/hip_fp16.h'
  '/opt/rocm-5.6.0/include/hip/amd_detail/amd_hip_fp16.h'
clang-16: warning: argument unused during compilation: '-fcuda-flush-denormals-to-zero' [-Wunused-command-line-argument]
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 3.270s, Critical Path: 3.10s
INFO: 77 processes: 53 internal, 24 local.
FAILED: Build did NOT complete successfully
@google-ml-butler google-ml-butler bot added the type:build/install Build and install issues label Jul 22, 2023
@sushreebarsa sushreebarsa added the subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues label Jul 23, 2023
@SuryanarayanaY
Copy link
Collaborator

Hi @MrTreev ,

Could you please test with the below configurations of Clang and Bazel and let us know if problem still persists.Because higher versions may or may not compatible. It seems your Clang version is 17.0 against tested version of 16.0.0. Same for Bazel also where it seems you have 6.1.0 installed and tested version is 5.3.0 for Tf2.13.

Version Python version Compiler Build tools
tensorflow-2.13.0 3.8-3.11 Clang 16.0.0 Bazel 5.3.0

You can find the build instructions here.

@SuryanarayanaY SuryanarayanaY added the stat:awaiting response Status - Awaiting response from author label Jul 24, 2023
@MrTreev
Copy link
Author

MrTreev commented Jul 24, 2023

I certainly can try that either later tonight or tomorrow morning (AEST). I'll get back to you when that's done.

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Jul 24, 2023
@MrTreev
Copy link
Author

MrTreev commented Jul 24, 2023

On the r2.13 branch, I've switched to Bazel 5.3.0, I'll try the other changes in the morning tomorrow, but so far, no difference in the error (Been doing full clean builds each time)

@SuryanarayanaY
Copy link
Collaborator

Hi @MrTreev ,

Kindly update on this. Thanks!

@SuryanarayanaY SuryanarayanaY added the stat:awaiting response Status - Awaiting response from author label Jul 26, 2023
@MrTreev
Copy link
Author

MrTreev commented Jul 28, 2023

Hi @SuryanarayanaY, I'm attempting to get Clang 16 reliably working at the moment, Sadly the archlinux repos currently have only 15 and 17, so I'm having to do it manually and trying not to break the rest of my environment while doing so is proving a little tricky. Thankfully I should be able to dedicate a good bit of time over the next couple of days to this, so I hope to have an update soon.

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Jul 28, 2023
@MrTreev
Copy link
Author

MrTreev commented Jul 28, 2023

I've found a set of working rocm packages with clang-16 included, since I've switched to them I have gotten a different error, which I believe should be able to be fixed by adding the files somewhere in the bazel build system, I'm trying to figure out where exactly at the moment, but if there's anyone that could look at this that'd be appreciated.

INFO: Analyzed target //tensorflow/tools/pip_package:build_pip_package (624 packages loaded, 43746 targets configured).
INFO: Found 1 target...
ERROR: /home/user/Repos/tensorflow/tensorflow/compiler/xla/stream_executor/rocm/BUILD:463:11: Compiling tensorflow/compiler/xla/stream_executor/rocm/rocm_helpers.cu.cc [for host] failed: undeclared inclusion(s) in rule '//tensorflow/compiler/xla/stream_executor/rocm:rocm_helpers':
this rule is missing dependency declarations for the following files included by 'tensorflow/compiler/xla/stream_executor/rocm/rocm_helpers.cu.cc':
  '/opt/rocm/llvm/lib/clang/16.0.0/include/__clang_hip_runtime_wrapper.h'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/cuda_wrappers/cmath'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/stddef.h'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/__clang_hip_libdevice_declares.h'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/__clang_hip_math.h'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/cuda_wrappers/algorithm'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/cuda_wrappers/new'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/limits.h'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/stdint.h'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/__clang_hip_stdlib.h'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/__clang_cuda_math_forward_declares.h'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/__clang_hip_cmath.h'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/__clang_cuda_complex_builtins.h'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/cuda_wrappers/complex'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/__stddef_max_align_t.h'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/stdarg.h'
/home/user/.cache/bazel/_bazel_user/057ab612123f87ae7f238751a7c28667/execroot/org_tensorflow/external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc:23: DeprecationWarning: 'pipes' is deprecated and slated for removal in Python 3.13
  import pipes
clang-16: warning: argument unused during compilation: '-fcuda-flush-denormals-to-zero' [-Wunused-command-line-argument]
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 27.215s, Critical Path: 6.75s
INFO: 536 processes: 125 internal, 411 local.
FAILED: Build did NOT complete successfully

@MrTreev
Copy link
Author

MrTreev commented Jul 28, 2023

I found that ROCm tensorflow-upstream goes further in the build process, so I'm looking at the differences at the moment to try to find a fix

@MrTreev
Copy link
Author

MrTreev commented Jul 29, 2023

I don't think there's a simple fix I can apply, and the best place for my issue is likely in the RadeonOpenCompute fork until the changes I need are merged.

@MrTreev MrTreev closed this as not planned Won't fix, can't repro, duplicate, stale Jul 29, 2023
@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@guangzlu
Copy link

guangzlu commented Aug 1, 2024

I've found a set of working rocm packages with clang-16 included, since I've switched to them I have gotten a different error, which I believe should be able to be fixed by adding the files somewhere in the bazel build system, I'm trying to figure out where exactly at the moment, but if there's anyone that could look at this that'd be appreciated.

INFO: Analyzed target //tensorflow/tools/pip_package:build_pip_package (624 packages loaded, 43746 targets configured).
INFO: Found 1 target...
ERROR: /home/user/Repos/tensorflow/tensorflow/compiler/xla/stream_executor/rocm/BUILD:463:11: Compiling tensorflow/compiler/xla/stream_executor/rocm/rocm_helpers.cu.cc [for host] failed: undeclared inclusion(s) in rule '//tensorflow/compiler/xla/stream_executor/rocm:rocm_helpers':
this rule is missing dependency declarations for the following files included by 'tensorflow/compiler/xla/stream_executor/rocm/rocm_helpers.cu.cc':
  '/opt/rocm/llvm/lib/clang/16.0.0/include/__clang_hip_runtime_wrapper.h'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/cuda_wrappers/cmath'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/stddef.h'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/__clang_hip_libdevice_declares.h'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/__clang_hip_math.h'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/cuda_wrappers/algorithm'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/cuda_wrappers/new'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/limits.h'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/stdint.h'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/__clang_hip_stdlib.h'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/__clang_cuda_math_forward_declares.h'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/__clang_hip_cmath.h'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/__clang_cuda_complex_builtins.h'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/cuda_wrappers/complex'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/__stddef_max_align_t.h'
  '/opt/rocm/llvm/lib/clang/16.0.0/include/stdarg.h'
/home/user/.cache/bazel/_bazel_user/057ab612123f87ae7f238751a7c28667/execroot/org_tensorflow/external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc:23: DeprecationWarning: 'pipes' is deprecated and slated for removal in Python 3.13
  import pipes
clang-16: warning: argument unused during compilation: '-fcuda-flush-denormals-to-zero' [-Wunused-command-line-argument]
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 27.215s, Critical Path: 6.75s
INFO: 536 processes: 125 internal, 411 local.
FAILED: Build did NOT complete successfully

Hi @MrTreev did you solve this problem at last? Or you switch to tensorflow in Rocm repo?

@MrTreev
Copy link
Author

MrTreev commented Aug 1, 2024

@guangzlu I switched to the ROCm version, haven't tried the base version on my ROCm hardware recently.

I might get some time to try this weekend if it'd help solve issues, but for me the ROCm version is stable and functioning.

@guangzlu
Copy link

guangzlu commented Aug 1, 2024

@guangzlu I switched to the ROCm version, haven't tried the base version on my ROCm hardware recently.

I might get some time to try this weekend if it'd help solve issues, but for me the ROCm version is stable and functioning.

Hi @MrTreev I found the solution of this issue: add

inc_dirs.append(rocm_toolkit_path + "/llvm/lib/clang/16.0.0/include")
inc_dirs.append(rocm_toolkit_path + "/llvm/lib/clang/17.0.0/include")

into https://github.com/buptzyb/tensorflow/blob/bae09790a3cd5493158909d4d27b17320b703e80/third_party/gpus/rocm_configure.bzl#L198

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests

4 participants