Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot find libdevice in TF 2.11 + compilation fails without ptxas #296

Open
1 task done
drasmuss opened this issue Jan 12, 2023 · 38 comments
Open
1 task done

Cannot find libdevice in TF 2.11 + compilation fails without ptxas #296

drasmuss opened this issue Jan 12, 2023 · 38 comments
Labels

Comments

@drasmuss
Copy link

Solution to issue cannot be found in the documentation.

  • I checked the documentation.

Issue

TensorFlow 2.11 broke something about how they locate the libdevice library, when cuda is installed through conda. See tensorflow/tensorflow#56927 or tensorflow/tensorflow#59013.

Here is a simple repro script:

mamba create -n tmp python=3.9 tensorflow=2.11
mamba activate tmp
python -c "import tensorflow as tf; model = tf.keras.models.Sequential([tf.keras.layers.Dense(10, input_shape=(10,))]); model.compile(loss=tf.losses.mse); model.fit(tf.ones((32, 10)), tf.ones((32, 10)))"

Which gives the error:

    ...
    File ".../mambaforge/envs/tmp/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
      return tf.__internal__.distribute.interim.maybe_merge_call(
    File ".../mambaforge/envs/tmp/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
      distribution.extended.update(
    File ".../mambaforge/envs/tmp/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
      return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'StatefulPartitionedCall_1'
libdevice not found at ./libdevice.10.bc
         [[{{node StatefulPartitionedCall_1}}]] [Op:__inference_train_function_401]

I suspect that this is a bug on TensorFlow's end, not something you are really responsible for. But the only fixes in the issues linked above involve hacky workarounds, manually copying the libdevice file to some other location where TensorFlow is expecting to find it. So I'm wondering if it'd be possible to fix it more robustly in the conda-forge package, so that we don't have to manually copy files around every time we create a new environment.

Installed packages

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
absl-py                   1.4.0              pyhd8ed1ab_0    conda-forge
aiohttp                   3.8.3            py39hb9d737c_1    conda-forge
aiosignal                 1.3.1              pyhd8ed1ab_0    conda-forge
appdirs                   1.4.4              pyh9f0ad1d_0    conda-forge
astunparse                1.6.3              pyhd8ed1ab_0    conda-forge
async-timeout             4.0.2              pyhd8ed1ab_0    conda-forge
attrs                     22.2.0             pyh71513ae_0    conda-forge
blinker                   1.5                pyhd8ed1ab_0    conda-forge
brotlipy                  0.7.0           py39hb9d737c_1005    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2022.12.7            ha878542_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
cachetools                5.2.0              pyhd8ed1ab_0    conda-forge
certifi                   2022.12.7          pyhd8ed1ab_0    conda-forge
cffi                      1.15.1           py39he91dace_3    conda-forge
charset-normalizer        2.1.1              pyhd8ed1ab_0    conda-forge
click                     8.1.3           unix_pyhd8ed1ab_2    conda-forge
cryptography              39.0.0           py39h079d5ae_0    conda-forge
cudatoolkit               11.8.0              h37601d7_11    conda-forge
cudnn                     8.4.1.50             hed8a83a_0    conda-forge
flatbuffers               22.12.06             hcb278e6_2    conda-forge
frozenlist                1.3.3            py39hb9d737c_0    conda-forge
gast                      0.4.0              pyh9f0ad1d_0    conda-forge
giflib                    5.2.1                h36c2ea0_2    conda-forge
google-auth               2.15.0             pyh1a96a4e_0    conda-forge
google-auth-oauthlib      0.4.6              pyhd8ed1ab_0    conda-forge
google-pasta              0.2.0              pyh8c360ce_0    conda-forge
grpcio                    1.51.1           py39h8c60046_0    conda-forge
h5py                      3.7.0           nompi_py39h817c9c5_102    conda-forge
hdf5                      1.12.2          nompi_h4df4325_101    conda-forge
icu                       70.1                 h27087fc_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
importlib-metadata        6.0.0              pyha770c72_0    conda-forge
jpeg                      9e                   h166bdaf_2    conda-forge
keras                     2.11.0             pyhd8ed1ab_0    conda-forge
keras-preprocessing       1.1.2              pyhd8ed1ab_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.20.1               h81ceb04_0    conda-forge
ld_impl_linux-64          2.39                 hcc3a1bd_1    conda-forge
libabseil                 20220623.0      cxx17_h05df665_6    conda-forge
libaec                    1.0.6                h9c3ff4c_0    conda-forge
libblas                   3.9.0           16_linux64_openblas    conda-forge
libcblas                  3.9.0           16_linux64_openblas    conda-forge
libcurl                   7.87.0               hdc1c0ab_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 12.2.0              h65d4601_19    conda-forge
libgfortran-ng            12.2.0              h69a702a_19    conda-forge
libgfortran5              12.2.0              h337968e_19    conda-forge
libgomp                   12.2.0              h65d4601_19    conda-forge
libgrpc                   1.51.1               h30feacc_0    conda-forge
liblapack                 3.9.0           16_linux64_openblas    conda-forge
libnghttp2                1.51.0               hff17c54_0    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libopenblas               0.3.21          pthreads_h78a6416_3    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
libprotobuf               3.21.12              h3eb15da_0    conda-forge
libsqlite                 3.40.0               h753d276_0    conda-forge
libssh2                   1.10.0               hf14f497_3    conda-forge
libstdcxx-ng              12.2.0              h46fd767_19    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libzlib                   1.2.13               h166bdaf_4    conda-forge
markdown                  3.4.1              pyhd8ed1ab_0    conda-forge
markupsafe                2.1.1            py39hb9d737c_2    conda-forge
multidict                 6.0.4            py39h72bdee0_0    conda-forge
nccl                      2.14.3.1             h0800d71_0    conda-forge
ncurses                   6.3                  h27087fc_1    conda-forge
numpy                     1.24.1           py39h223a676_0    conda-forge
oauthlib                  3.2.2              pyhd8ed1ab_0    conda-forge
openssl                   3.0.7                h0b41bf4_1    conda-forge
opt_einsum                3.3.0              pyhd8ed1ab_1    conda-forge
packaging                 23.0               pyhd8ed1ab_0    conda-forge
pip                       22.3.1             pyhd8ed1ab_0    conda-forge
pooch                     1.6.0              pyhd8ed1ab_0    conda-forge
protobuf                  4.21.12          py39h227be39_0    conda-forge
pyasn1                    0.4.8                      py_0    conda-forge
pyasn1-modules            0.2.7                      py_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pyjwt                     2.6.0              pyhd8ed1ab_0    conda-forge
pyopenssl                 23.0.0             pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.9.15          hba424b6_0_cpython    conda-forge
python-flatbuffers        23.1.4             pyhd8ed1ab_0    conda-forge
python_abi                3.9                      3_cp39    conda-forge
pyu2f                     0.1.5              pyhd8ed1ab_0    conda-forge
re2                       2022.06.01           h27087fc_1    conda-forge
readline                  8.1.2                h0f457ee_0    conda-forge
requests                  2.28.1             pyhd8ed1ab_1    conda-forge
requests-oauthlib         1.3.1              pyhd8ed1ab_0    conda-forge
rsa                       4.9                pyhd8ed1ab_0    conda-forge
scipy                     1.10.0           py39h7360e5f_0    conda-forge
setuptools                65.6.3             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.1.9                hbd366e4_2    conda-forge
tensorboard               2.11.0             pyhd8ed1ab_0    conda-forge
tensorboard-data-server   0.6.1            py39h3ccb8fc_4    conda-forge
tensorboard-plugin-wit    1.8.1              pyhd8ed1ab_0    conda-forge
tensorflow                2.11.0          cuda112py39h01bd6f0_0    conda-forge
tensorflow-base           2.11.0          cuda112py39haa5674d_0    conda-forge
tensorflow-estimator      2.11.0          cuda112py39h11d7a3b_0    conda-forge
termcolor                 2.2.0              pyhd8ed1ab_0    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
typing-extensions         4.4.0                hd8ed1ab_0    conda-forge
typing_extensions         4.4.0              pyha770c72_0    conda-forge
tzdata                    2022g                h191b570_0    conda-forge
urllib3                   1.26.14            pyhd8ed1ab_0    conda-forge
werkzeug                  2.2.2              pyhd8ed1ab_0    conda-forge
wheel                     0.38.4             pyhd8ed1ab_0    conda-forge
wrapt                     1.14.1           py39hb9d737c_1    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yarl                      1.8.2            py39hb9d737c_0    conda-forge
zipp                      3.11.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               h166bdaf_4    conda-forge

Environment info

active environment : tmp
    active env location : /home/drasmuss/mambaforge/envs/tmp
            shell level : 9
       user config file : /home/drasmuss/.condarc
 populated config files : /home/drasmuss/mambaforge/.condarc
                          /home/drasmuss/.condarc
          conda version : 22.9.0
    conda-build version : not installed
         python version : 3.10.6.final.0
       virtual packages : __cuda=12.0=0
                          __linux=5.15.79.1=0
                          __glibc=2.31=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /home/drasmuss/mambaforge  (writable)
      conda av data dir : /home/drasmuss/mambaforge/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /home/drasmuss/mambaforge/pkgs
                          /home/drasmuss/.conda/pkgs
       envs directories : /home/drasmuss/mambaforge/envs
                          /home/drasmuss/.conda/envs
               platform : linux-64
             user-agent : conda/22.9.0 requests/2.28.1 CPython/3.10.6 Linux/5.15.79.1-microsoft-standard-WSL2 ubuntu/20.04.5 glibc/2.31
                UID:GID : 1000:1000
             netrc file : None
           offline mode : False
@drasmuss drasmuss added the bug label Jan 12, 2023
@hmaarrfk
Copy link
Contributor

can you point to the specific fix?

@drasmuss
Copy link
Author

This is the clearest set of instructions I found tensorflow/tensorflow#56927 (comment)

@drasmuss
Copy link
Author

Specifically, if I do these steps, the error goes away

mkdir -p $CONDA_PREFIX/lib/nvvm/libdevice
cp $CONDA_PREFIX/lib/libdevice.10.bc $CONDA_PREFIX/lib/nvvm/libdevice/libdevice.10.bc
XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX/lib python -c "import tensorflow as tf; model = tf.keras.models.Sequential([tf.keras.layers.Dense(10, input_shape=(10,))]); model.compile(loss=tf.losses.mse); model.fit(tf.ones((32, 10)), tf.ones((32, 10)))"

There's a new error about ptxas compilation, but I'm not sure if that's related to this or a separate issue.

@hmaarrfk
Copy link
Contributor

By new error, do you mean Aborted (core dumped)?

@hmaarrfk
Copy link
Contributor

Yeah, with tensorflow 2.10 i get:

python -c "import tensorflow as tf; model = tf.keras.models.Sequential([tf.keras.layers.Dense(10, input_shape=(10,))]); model.compile(loss=tf.losses.mse); model.fit(tf.ones((32, 10)), tf.ones((32, 10)))"
2023-01-12 20:01:09.730449: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-12 20:01:09.814895: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-01-12 20:01:10.972323: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-12 20:01:11.375475: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5294 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1060 6GB, pci bus id: 0000:17:00.0, compute capability: 6.1
1/1 [==============================] - 0s 328ms/step - loss: 1.9309

I wonder if we just have to disable xla.

@hmaarrfk
Copy link
Contributor

For internal reference, this is the pull request that moved that code last.
tensorflow/tensorflow@e7ec37f

That said, I just don't get what the problem is. Maybe we have to disable XLA?

@drasmuss
Copy link
Author

By new error, do you mean Aborted (core dumped)?

Yes, here's the error printout I get after applying the first "fix":

2023-01-12 18:09:15.667594: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 8887 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5
2023-01-12 18:09:16.442301: I tensorflow/compiler/xla/service/service.cc:173] XLA service 0x7f930d167e50 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-01-12 18:09:16.442334: I tensorflow/compiler/xla/service/service.cc:181]   StreamExecutor device (0): NVIDIA GeForce RTX 2080 Ti, Compute Capability 7.5
2023-01-12 18:09:16.444935: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-01-12 18:09:16.491617: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-01-12 18:09:16.509586: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-01-12 18:09:16.509637: W tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:85] Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version
2023-01-12 18:09:16.528680: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-01-12 18:09:16.528775: F tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:454] ptxas returned an error during compilation of ptx to sass: 'INTERNAL: Failed to launch ptxas'  If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.    
Aborted

@hmaarrfk
Copy link
Contributor

I uploaded some packages built with the the following patch

diff --git a/recipe/build.sh b/recipe/build.sh
index 95db01e..a71c8c6 100644
--- a/recipe/build.sh
+++ b/recipe/build.sh
@@ -105,7 +105,7 @@ if [[ "${target_platform}" == "osx-arm64" ]]; then
   # See https://conda-forge.org/docs/maintainer/knowledge_base.html#newer-c-features-with-old-sdk
   export CXXFLAGS="${CXXFLAGS} -D_LIBCPP_DISABLE_AVAILABILITY"
 fi
-export TF_ENABLE_XLA=1
+export TF_ENABLE_XLA=0
 export BUILD_TARGET="//tensorflow/tools/pip_package:build_pip_package //tensorflow/tools/lib_package:libtensorflow //tensorflow:libtensorflow_cc${SHLIB_EXT}"

 # Python settings
diff --git a/recipe/meta.yaml b/recipe/meta.yaml
index 7fb9b6b..b31eb19 100644
--- a/recipe/meta.yaml
+++ b/recipe/meta.yaml
@@ -16,7 +16,7 @@ source:
     folder: tensorflow-estimator

 build:
-  number: 0
+  number: 1
   skip: true  # [win]
   skip: true  # [python_impl == 'pypy']
   skip: true  # [libabseil != '20220623.0']
https://anaconda.org/mark.harfouche/ but they somehow work worse. I can't get past the
Node: 'StatefulPartitionedCall_1'
Could not find compiler for platform CUDA: NOT_FOUND: could not find registered compiler for platform CUDA -- was support for that platform linked in?
         [[{{node StatefulPartitionedCall_1}}]] [Op:__inference_train_function_401]

@drasmuss
Copy link
Author

Not sure if this helps, but I found that this is specifically triggered by the new optimizers that they made the default in TF 2.11.

If you use

model.compile(loss=tf.losses.mse, optimizer=tf.keras.optimizers.SGD())

you get the error, but if you use

model.compile(loss=tf.losses.mse, optimizer=tf.keras.optimizers.legacy.SGD())

no error.

@drasmuss
Copy link
Author

I opened an issue with the Keras team here keras-team/tf-keras#62, in case that yields any results.

@hmaarrfk
Copy link
Contributor

Do you get the same results if you install from their conda packages and not ours? Typically people don't like to debug conda-forge stuff.

@drasmuss
Copy link
Author

Following TensorFlow's recommended installation steps, i.e.

conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
pip install tensorflow

produces the same error.

The anaconda tensorflow package is still on 2.10, so can't test that.

@hmaarrfk
Copy link
Contributor

great thank you for confirming.

@ngam
Copy link
Contributor

ngam commented Feb 13, 2023

This is a long-standing problem with XLA needing ptxas. If you get ptxas from somewhere else, e.g., conda install -c nvidia cuda-nvcc, does your issue go away? It's the same issue with jax

@ngam
Copy link
Contributor

ngam commented Feb 13, 2023

I've been tracking this for a while. I think we don't get reports of this "bug" because people who use CUDA, usually have more than one installation and so somehow our tensorflow picks up all it needs from elsewhere if not available in conda-forge. In my experience, this is only ptxas, but it could be other things. An example is people who are on HPCs usually have native installations of cuda and ptxas is often part of that (not always, but one could always request it from admins).

The good news: a whole new way of dealing with cuda is coming to conda-forge (great!)
The bad news: it will likely take a long-ish time before that comes to fruition and there is a tendency for the nvidia team to work internally (e.g., qc, testing, etc.) before releasing stuff to the public (conda-forge)

@ngam ngam changed the title Cannot find libdevice in TF 2.11 Cannot find libdevice in TF 2.11 + compilation fails without ptxas Feb 13, 2023
@drasmuss
Copy link
Author

If you get ptxas from somewhere else, e.g., conda install -c nvidia cuda-nvcc, does your issue go away?

This doesn't make the initial libdevice error go away, but if you apply the hacky fix from here #296 (comment) then you no longer get that secondary Aborted ptxas-related error.

@ngam
Copy link
Contributor

ngam commented Feb 13, 2023

Yeah, we will need fix the libdevice issue separately

@hameer-spire
Copy link

I can confirm that installing cudatoolkit-dev from conda-forge and following #296 (comment) fixes the issue for me as well.

@sh-shahrokhi
Copy link

sh-shahrokhi commented Mar 24, 2023

I have this libdevice issue too. Fix is appreciated.

@Esokrates
Copy link

At least we see

error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice

with tensorflow-gpu 2.10 as well from conda-forge. Workaround was to create lib/nvvm/libdevice and copy lib/libdevice.10.bc over there, mamba install -y -c nvidia cuda-nvcc as well as export the XLA_FLAGS variable accordingly.

@hmaarrfk
Copy link
Contributor

hmm, i just hit this again. I was unable to "fix" it so I had to downgrade to tensorflow 2.13 for the moment, will revisit "soon"

@jakirkham
Copy link
Member

Thanks Mark for drawing my attention to this! 🙏

Think there is a structuring issue with NVVM in the cudatoolkit package. Have tried to outline this in issue: conda-forge/cudatoolkit-feedstock#96

Idk if just restructuring the NVVM contents is enough to fix the issue, but it is at least a required step

The CUDA 12 packages are better structured (and more complete). So it is possible using CUDA 12 will also fix the issue

@njzjz
Copy link
Member

njzjz commented Dec 1, 2023

I found a workaround for TF 2.14:

pip install nvidia-cuda-nvcc-cu11

This PyPI package contains libdevice.10.bc, and TensorFlow can find it correctly.

@njzjz
Copy link
Member

njzjz commented Dec 1, 2023

The CUDA 12 packages are better structured (and more complete). So it is possible using CUDA 12 will also fix the issue

@jakirkham Do you know what package includes NVVM files? It may need to be added to #353

@jakirkham
Copy link
Member

‎cuda-nvcc-tools contains part NVVM. The rest is in cuda-nvcc-impl

Though I think TensorFlow hasn't been rebuilt for CUDA 12 yet ( #354 )

@njzjz
Copy link
Member

njzjz commented Dec 1, 2023

Though I think TensorFlow hasn't been rebuilt for CUDA 12 yet ( #354 )

CUDA 12 migration was manually added by @xhochy in #353, in 21664ce.

@link89
Copy link

link89 commented Feb 2, 2024

pip install nvidia-cuda-nvcc-cu11

This work around works!

@hmaarrfk
Copy link
Contributor

I feel like i'm hitting this again abut I have tried to install cuda-nvcc

These are the cuda packages I have:

 $ mamba list | grep cuda
cuda-cccl_linux-64        12.6.77              ha770c72_0    conda-forge
cuda-crt-dev_linux-64     12.6.77              ha770c72_0    conda-forge
cuda-crt-tools            12.6.77              ha770c72_0    conda-forge
cuda-cudart               12.6.77              h5888daf_0    conda-forge
cuda-cudart-dev           12.6.77              h5888daf_0    conda-forge
cuda-cudart-dev_linux-64  12.6.77              h3f2d84a_0    conda-forge
cuda-cudart-static        12.6.77              h5888daf_0    conda-forge
cuda-cudart-static_linux-64 12.6.77              h3f2d84a_0    conda-forge
cuda-cudart_linux-64      12.6.77              h3f2d84a_0    conda-forge
cuda-cupti                12.6.80              hbd13f7d_0    conda-forge
cuda-driver-dev_linux-64  12.6.77              h3f2d84a_0    conda-forge
cuda-nvcc                 12.6.77              hcdd1206_0    conda-forge
cuda-nvcc-dev_linux-64    12.6.77              he91c749_0    conda-forge
cuda-nvcc-impl            12.6.77              h85509e4_0    conda-forge
cuda-nvcc-tools           12.6.77              he02047a_0    conda-forge
cuda-nvcc_linux-64        12.6.77              h8a487aa_0    conda-forge
cuda-nvrtc                12.6.77              hbd13f7d_0    conda-forge
cuda-nvtx                 12.6.77              hbd13f7d_0    conda-forge
cuda-nvvm-dev_linux-64    12.6.77              ha770c72_0    conda-forge
cuda-nvvm-impl            12.6.77              he02047a_0    conda-forge
cuda-nvvm-tools           12.6.77              he02047a_0    conda-forge
cuda-version              12.6                 h7480c83_3    conda-forge
cupyifcudaavailable       1.0.0                     cpu_0    ramonaoptics
libtorch                  2.4.1           cuda120_h1d34654_302    conda-forge
nocuda11                  1.0                           0    ramonaoptics
onnxruntime               1.18.1          py310ha300224_201_cuda    conda-forge
pytorch                   2.4.1           cuda120_py310h5d94b2e_302    conda-forge
tensorflow                2.17.0          cuda120py310heb3ae67_203    conda-forge
tensorflow-base           2.17.0          cuda120py310ha9db03a_203    conda-forge
tensorflow-estimator      2.17.0          cuda120py310h1fd330c_203    conda-forge

The same recreator

python -c "import tensorflow as tf; model = tf.keras.models.Sequential([tf.keras.layers.Dense(10, input_shape=(10,))]); model.compile(loss=tf.losses.mse); model.fit(tf.ones((32, 10)), tf.ones((32, 10)))"

still create teh effect. I have

          mamba version : 1.5.10
     active environment : dev
    active env location : /home/mark/miniforge3/envs/dev
            shell level : 1
       user config file : /home/mark/.condarc
 populated config files : /home/mark/miniforge3/.condarc
                          /home/mark/.condarc
          conda version : 24.9.1
    conda-build version : 24.5.1
         python version : 3.10.14.final.0
                 solver : libmamba (default)
       virtual packages : __archspec=1=zen
                          __conda=24.9.1=0
                          __cuda=12.4=0
                          __glibc=2.35=0
                          __linux=6.8.0=0
                          __unix=0=0
       base environment : /home/mark/miniforge3  (writable)
      conda av data dir : /home/mark/miniforge3/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /home/mark/miniforge3/pkgs
                          /home/mark/.conda/pkgs
       envs directories : /home/mark/miniforge3/envs
                          /home/mark/.conda/envs
               platform : linux-64
             user-agent : conda/24.9.1 requests/2.32.3 CPython/3.10.14 Linux/6.8.0-47-generic ubuntu/22.04.5 glibc/2.35 solver/libmamba conda-libmamba-solver/24.9.0 libmambapy/1.5.10
                UID:GID : 1003:1003
             netrc file : None
           offline mode : False

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Oct 18, 2024

nevermind

So i think the problem is that the _build_env is leaking into some header files

$ grep _build_env . -R
grep: ./lib/python3.10/site-packages/tensorflow/libtensorflow_cc.so.2: binary file matches
./lib/python3.10/site-packages/tensorflow/include/third_party/gpus/cuda/cuda_config.h:#define TF_CUDA_TOOLKIT_PATH "/home/conda/feedstock_root/build_artifacts/tensorflow-split_1729109664689/_build_env/targets/x86_64-linux"
./lib/python3.10/site-packages/tensorflow/include/external/local_config_cuda/cuda/cuda/cuda_config.h:#define TF_CUDA_TOOLKIT_PATH "/home/conda/feedstock_root/build_artifacts/tensorflow-split_1729109664689/_build_env/targets/x86_64-linux"
grep: ./lib/python3.10/site-packages/tensorflow/python/platform/__pycache__/build_info.cpython-310.pyc: binary file matches
./lib/python3.10/site-packages/tensorflow/python/platform/build_info.py:build_info = collections.OrderedDict([('cpu_compiler', '/home/conda/feedstock_root/build_artifacts/tensorflow-split_1729109664689/_build_env/bin/x86_64-conda-linux-gnu-gcc'), ('cuda_compute_capabilities', ['sm_60', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'sm_89', 'sm_90', 'compute_90']), ('cuda_version', '12.0'), ('cudnn_version', '9'), ('is_cuda_build', True), ('is_rocm_build', False), ('is_tensorrt_build', False)])

reveals that the build_env is in the cuda_config.h files.

Replacing that manually with the ${CONDA_PREFIX} seems to resolve things.

We should be able to make this substitution readily in the recipe, but would require a recompilation of the packages.

Nevermind it succeeded because I was in the directory

${CONDA_PREFIX}/targets/x86_64-linux

prior to running

python -c "import tensorflow as tf; model = tf.keras.models.Sequential([tf.keras.layers.Dense(10, input_shape=(10,))]); model.compile(loss=tf.losses.mse); model.fit(tf.ones((32, 10)), tf.ones((32, 10)))"

@hmaarrfk
Copy link
Contributor

Still the "solution" seem to be to:

XLA_FLAGS=--xla_gpu_cuda_data_dir=${CONDA_PREFIX}/targets/x86_64-linux \
python -c "import tensorflow as tf; model = tf.keras.models.Sequential([tf.keras.layers.Dense(10, input_shape=(10,))]); model.compile(loss=tf.losses.mse); model.fit(tf.ones((32, 10)), tf.ones((32, 10)))"

In fact I think it is related to CUDNN 8 vs 9.....

# Works
mamba create --name cudnn8 tensorflow=2.17.0=cuda120* cudnn=8 --yes
conda run --name cudnn8 python -c "import tensorflow as tf; model = tf.keras.models.Sequential([tf.keras.layers.Dense(10, input_shape=(10,))]); model.compile(loss=tf.losses.mse); model.fit(tf.ones((32, 10)), tf.ones((32, 10)))"
# Fails
mamba create --name cudnn9 tensorflow=2.17.0=cuda120* cudnn=9 --yes
conda run --name cudnn9 python -c "import tensorflow as tf; model = tf.keras.models.Sequential([tf.keras.layers.Dense(10, input_shape=(10,))]); model.compile(loss=tf.losses.mse); model.fit(tf.ones((32, 10)), tf.ones((32, 10)))"

@hmaarrfk
Copy link
Contributor

I can't help to think that it is the same issue as:
conda-forge/jaxlib-feedstock#283 (comment)

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Oct 18, 2024

The plot thickens:

# cudnn9 + build 202 works
mamba create --name cudnn9_202 tensorflow=2.17.0=cuda120*_202 cudnn=9 --yes
conda run --name cudnn9_202 python -c "import tensorflow as tf; model = tf.keras.models.Sequential([tf.keras.layers.Dense(10, input_shape=(10,))]); model.compile(loss=tf.losses.mse); model.fit(tf.ones((32, 10)), tf.ones((32, 10)))"
mamba list of cudnn9_202
# packages in environment at /home/mark/miniforge3/envs/cudnn9_202:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
absl-py                   2.1.0              pyhd8ed1ab_0    conda-forge
astunparse                1.6.3              pyhd8ed1ab_2    conda-forge
brotli-python             1.1.0           py312h2ec8cdc_2    conda-forge
bzip2                     1.0.8                h4bc722e_7    conda-forge
c-ares                    1.34.2               heb4867d_0    conda-forge
ca-certificates           2024.8.30            hbcca054_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
certifi                   2024.8.30          pyhd8ed1ab_0    conda-forge
cffi                      1.17.1          py312h06ac9bb_0    conda-forge
charset-normalizer        3.4.0              pyhd8ed1ab_0    conda-forge
cuda-crt-tools            12.6.77              ha770c72_0    conda-forge
cuda-cudart               12.6.77              h5888daf_0    conda-forge
cuda-cudart_linux-64      12.6.77              h3f2d84a_0    conda-forge
cuda-cupti                12.6.80              hbd13f7d_0    conda-forge
cuda-nvcc-tools           12.6.77              he02047a_0    conda-forge
cuda-nvrtc                12.6.77              hbd13f7d_0    conda-forge
cuda-nvtx                 12.6.77              hbd13f7d_0    conda-forge
cuda-nvvm-tools           12.6.77              he02047a_0    conda-forge
cuda-version              12.6                 h7480c83_3    conda-forge
cudnn                     9.3.0.75             h93bb076_0    conda-forge
flatbuffers               24.3.25              h59595ed_0    conda-forge
gast                      0.5.5              pyhd8ed1ab_0    conda-forge
giflib                    5.2.2                hd590300_0    conda-forge
google-pasta              0.2.0              pyhd8ed1ab_1    conda-forge
grpcio                    1.62.2          py312hb06c811_0    conda-forge
h2                        4.1.0              pyhd8ed1ab_0    conda-forge
h5py                      3.12.1          nompi_py312hedeef09_100    conda-forge
hdf5                      1.14.3          nompi_hdf9ad27_105    conda-forge
hpack                     4.0.0              pyh9f0ad1d_0    conda-forge
hyperframe                6.0.1              pyhd8ed1ab_0    conda-forge
icu                       75.1                 he02047a_0    conda-forge
idna                      3.10               pyhd8ed1ab_0    conda-forge
importlib-metadata        8.5.0              pyha770c72_0    conda-forge
keras                     3.6.0              pyhd8ed1ab_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.21.3               h659f571_0    conda-forge
ld_impl_linux-64          2.43                 h712a8e2_1    conda-forge
libabseil                 20240116.2      cxx17_he02047a_1    conda-forge
libaec                    1.1.3                h59595ed_0    conda-forge
libblas                   3.9.0           24_linux64_openblas    conda-forge
libcblas                  3.9.0           24_linux64_openblas    conda-forge
libcublas                 12.6.3.3             hbd13f7d_1    conda-forge
libcufft                  11.3.0.4             hbd13f7d_0    conda-forge
libcurand                 10.3.7.77            hbd13f7d_0    conda-forge
libcurl                   8.10.1               hbbe4b11_0    conda-forge
libcusolver               11.7.1.2             hbd13f7d_0    conda-forge
libcusparse               12.5.4.2             hbd13f7d_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libexpat                  2.6.3                h5888daf_0    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc                    14.2.0               h77fa898_1    conda-forge
libgcc-ng                 14.2.0               h69a702a_1    conda-forge
libgfortran               14.2.0               h69a702a_1    conda-forge
libgfortran-ng            14.2.0               h69a702a_1    conda-forge
libgfortran5              14.2.0               hd5240d6_1    conda-forge
libgomp                   14.2.0               h77fa898_1    conda-forge
libgrpc                   1.62.2               h15f2491_0    conda-forge
libjpeg-turbo             3.0.0                hd590300_1    conda-forge
liblapack                 3.9.0           24_linux64_openblas    conda-forge
libnghttp2                1.58.0               h47da74e_1    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libnvjitlink              12.6.77              hbd13f7d_1    conda-forge
libopenblas               0.3.27          pthreads_hac2b453_1    conda-forge
libpng                    1.6.44               hadc24fc_0    conda-forge
libprotobuf               4.25.3               hd5b35b9_1    conda-forge
libre2-11                 2023.09.01           h5a48ba9_2    conda-forge
libsqlite                 3.46.1               hadc24fc_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx                 14.2.0               hc0a3c3a_1    conda-forge
libstdcxx-ng              14.2.0               h4852527_1    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libzlib                   1.3.1                hb9d3cd8_2    conda-forge
markdown                  3.6                pyhd8ed1ab_0    conda-forge
markdown-it-py            3.0.0              pyhd8ed1ab_0    conda-forge
markupsafe                3.0.1           py312h178313f_1    conda-forge
mdurl                     0.1.2              pyhd8ed1ab_0    conda-forge
ml_dtypes                 0.4.0           py312hf9745cd_2    conda-forge
namex                     0.0.8              pyhd8ed1ab_0    conda-forge
nccl                      2.23.4.1             h52f6c39_0    conda-forge
ncurses                   6.5                  he02047a_1    conda-forge
numpy                     1.26.4          py312heda63a1_0    conda-forge
openssl                   3.3.2                hb9d3cd8_0    conda-forge
opt_einsum                3.4.0              pyhd8ed1ab_0    conda-forge
optree                    0.13.0          py312h68727a3_0    conda-forge
packaging                 24.1               pyhd8ed1ab_0    conda-forge
pip                       24.2               pyh8b19718_1    conda-forge
protobuf                  4.25.3          py312h83439f5_1    conda-forge
pycparser                 2.22               pyhd8ed1ab_0    conda-forge
pygments                  2.18.0             pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.12.4          hb3fe705_101_cpython    ramonaoptics
python-flatbuffers        24.3.25            pyh59ac667_0    conda-forge
python_abi                3.12                    5_cp312    conda-forge
re2                       2023.09.01           h7f4b329_2    conda-forge
requests                  2.32.3             pyhd8ed1ab_0    conda-forge
rich                      13.9.2             pyhd8ed1ab_0    conda-forge
setuptools                75.1.0             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.2.1                ha2e4443_0    conda-forge
tensorboard               2.17.1             pyhd8ed1ab_0    conda-forge
tensorboard-data-server   0.7.0           py312hda17c39_2    conda-forge
tensorflow                2.17.0          cuda120py312h02ad488_202    conda-forge
tensorflow-base           2.17.0          cuda120py312h8a249fc_202    conda-forge
tensorflow-estimator      2.17.0          cuda120py312hfa0f5ef_202    conda-forge
termcolor                 2.5.0              pyhd8ed1ab_0    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
typing-extensions         4.12.2               hd8ed1ab_0    conda-forge
typing_extensions         4.12.2             pyha770c72_0    conda-forge
tzdata                    2024b                hc8b5060_0    conda-forge
urllib3                   2.2.3              pyhd8ed1ab_0    conda-forge
werkzeug                  3.0.4              pyhd8ed1ab_0    conda-forge
wheel                     0.44.0             pyhd8ed1ab_0    conda-forge
wrapt                     1.16.0          py312h66e93f0_1    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
zipp                      3.20.2             pyhd8ed1ab_0    conda-forge
zstandard                 0.23.0          py312hef9b889_1    conda-forge
zstd                      1.5.6                ha6fb4c9_0    conda-forge
# cudnn9 + build 203 fails
mamba create --name cudnn9_203 tensorflow=2.17.0=cuda120*_203 cudnn=9 --yes
conda run --name cudnn9_203 python -c "import tensorflow as tf; model = tf.keras.models.Sequential([tf.keras.layers.Dense(10, input_shape=(10,))]); model.compile(loss=tf.losses.mse); model.fit(tf.ones((32, 10)), tf.ones((32, 10)))"
mamba list of cudnn9_203
# packages in environment at /home/mark/miniforge3/envs/cudnn9_203:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
absl-py                   2.1.0              pyhd8ed1ab_0    conda-forge
astunparse                1.6.3              pyhd8ed1ab_2    conda-forge
brotli-python             1.1.0           py312h2ec8cdc_2    conda-forge
bzip2                     1.0.8                h4bc722e_7    conda-forge
c-ares                    1.34.2               heb4867d_0    conda-forge
ca-certificates           2024.8.30            hbcca054_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
certifi                   2024.8.30          pyhd8ed1ab_0    conda-forge
cffi                      1.17.1          py312h06ac9bb_0    conda-forge
charset-normalizer        3.4.0              pyhd8ed1ab_0    conda-forge
cuda-crt-tools            12.6.77              ha770c72_0    conda-forge
cuda-cudart               12.6.77              h5888daf_0    conda-forge
cuda-cudart_linux-64      12.6.77              h3f2d84a_0    conda-forge
cuda-cupti                12.6.80              hbd13f7d_0    conda-forge
cuda-nvcc-tools           12.6.77              he02047a_0    conda-forge
cuda-nvrtc                12.6.77              hbd13f7d_0    conda-forge
cuda-nvtx                 12.6.77              hbd13f7d_0    conda-forge
cuda-nvvm-tools           12.6.77              he02047a_0    conda-forge
cuda-version              12.6                 h7480c83_3    conda-forge
cudnn                     9.3.0.75             h93bb076_0    conda-forge
flatbuffers               24.3.25              h59595ed_0    conda-forge
gast                      0.5.5              pyhd8ed1ab_0    conda-forge
giflib                    5.2.2                hd590300_0    conda-forge
google-pasta              0.2.0              pyhd8ed1ab_1    conda-forge
grpcio                    1.65.5          py312h374181b_0    conda-forge
h2                        4.1.0              pyhd8ed1ab_0    conda-forge
h5py                      3.12.1          nompi_py312hedeef09_100    conda-forge
hdf5                      1.14.3          nompi_hdf9ad27_105    conda-forge
hpack                     4.0.0              pyh9f0ad1d_0    conda-forge
hyperframe                6.0.1              pyhd8ed1ab_0    conda-forge
icu                       75.1                 he02047a_0    conda-forge
idna                      3.10               pyhd8ed1ab_0    conda-forge
importlib-metadata        8.5.0              pyha770c72_0    conda-forge
keras                     3.6.0              pyhd8ed1ab_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.21.3               h659f571_0    conda-forge
ld_impl_linux-64          2.43                 h712a8e2_1    conda-forge
libabseil                 20240722.0      cxx17_h5888daf_1    conda-forge
libaec                    1.1.3                h59595ed_0    conda-forge
libblas                   3.9.0           24_linux64_openblas    conda-forge
libcblas                  3.9.0           24_linux64_openblas    conda-forge
libcublas                 12.6.3.3             hbd13f7d_1    conda-forge
libcufft                  11.3.0.4             hbd13f7d_0    conda-forge
libcurand                 10.3.7.77            hbd13f7d_0    conda-forge
libcurl                   8.10.1               hbbe4b11_0    conda-forge
libcusolver               11.7.1.2             hbd13f7d_0    conda-forge
libcusparse               12.5.4.2             hbd13f7d_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libexpat                  2.6.3                h5888daf_0    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc                    14.2.0               h77fa898_1    conda-forge
libgcc-ng                 14.2.0               h69a702a_1    conda-forge
libgfortran               14.2.0               h69a702a_1    conda-forge
libgfortran-ng            14.2.0               h69a702a_1    conda-forge
libgfortran5              14.2.0               hd5240d6_1    conda-forge
libgomp                   14.2.0               h77fa898_1    conda-forge
libgrpc                   1.65.5               hf5c653b_0    conda-forge
libjpeg-turbo             3.0.0                hd590300_1    conda-forge
liblapack                 3.9.0           24_linux64_openblas    conda-forge
libnghttp2                1.58.0               h47da74e_1    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libnvjitlink              12.6.77              hbd13f7d_1    conda-forge
libopenblas               0.3.27          pthreads_hac2b453_1    conda-forge
libpng                    1.6.44               hadc24fc_0    conda-forge
libprotobuf               5.27.5               h5b01275_2    conda-forge
libre2-11                 2024.07.02           hbbce691_1    conda-forge
libsqlite                 3.46.1               hadc24fc_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx                 14.2.0               hc0a3c3a_1    conda-forge
libstdcxx-ng              14.2.0               h4852527_1    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libzlib                   1.3.1                hb9d3cd8_2    conda-forge
markdown                  3.6                pyhd8ed1ab_0    conda-forge
markdown-it-py            3.0.0              pyhd8ed1ab_0    conda-forge
markupsafe                3.0.1           py312h178313f_1    conda-forge
mdurl                     0.1.2              pyhd8ed1ab_0    conda-forge
ml_dtypes                 0.4.0           py312hf9745cd_2    conda-forge
namex                     0.0.8              pyhd8ed1ab_0    conda-forge
nccl                      2.23.4.1             h52f6c39_0    conda-forge
ncurses                   6.5                  he02047a_1    conda-forge
numpy                     1.26.4          py312heda63a1_0    conda-forge
openssl                   3.3.2                hb9d3cd8_0    conda-forge
opt_einsum                3.4.0              pyhd8ed1ab_0    conda-forge
optree                    0.13.0          py312h68727a3_0    conda-forge
packaging                 24.1               pyhd8ed1ab_0    conda-forge
pip                       24.2               pyh8b19718_1    conda-forge
protobuf                  5.27.5          py312h2ec8cdc_0    conda-forge
pycparser                 2.22               pyhd8ed1ab_0    conda-forge
pygments                  2.18.0             pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.12.4          hb3fe705_101_cpython    ramonaoptics
python-flatbuffers        24.3.25            pyh59ac667_0    conda-forge
python_abi                3.12                    5_cp312    conda-forge
re2                       2024.07.02           h77b4e00_1    conda-forge
requests                  2.32.3             pyhd8ed1ab_0    conda-forge
rich                      13.9.2             pyhd8ed1ab_0    conda-forge
setuptools                75.1.0             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.2.1                ha2e4443_0    conda-forge
tensorboard               2.17.1             pyhd8ed1ab_0    conda-forge
tensorboard-data-server   0.7.0           py312hda17c39_2    conda-forge
tensorflow                2.17.0          cuda120py312h02ad488_203    conda-forge
tensorflow-base           2.17.0          cuda120py312hbec54f7_203    conda-forge
tensorflow-estimator      2.17.0          cuda120py312hfa0f5ef_203    conda-forge
termcolor                 2.5.0              pyhd8ed1ab_0    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
typing-extensions         4.12.2               hd8ed1ab_0    conda-forge
typing_extensions         4.12.2             pyha770c72_0    conda-forge
tzdata                    2024b                hc8b5060_0    conda-forge
urllib3                   2.2.3              pyhd8ed1ab_0    conda-forge
werkzeug                  3.0.4              pyhd8ed1ab_0    conda-forge
wheel                     0.44.0             pyhd8ed1ab_0    conda-forge
wrapt                     1.16.0          py312h66e93f0_1    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
zipp                      3.20.2             pyhd8ed1ab_0    conda-forge
zstandard                 0.23.0          py312hef9b889_1    conda-forge
zstd                      1.5.6                ha6fb4c9_0    conda-forge

Nothing really jumps out at me though from #403

@hmaarrfk hmaarrfk mentioned this issue Oct 18, 2024
5 tasks
@hmaarrfk
Copy link
Contributor

It seems that Uwe has a plan (see discussion in #405), but just wanted to report that

XLA_FLAGS=--xla_gpu_cuda_data_dir=${CONDA_PREFIX}/targets/x86_64-linux

seems to "resolve" things for those that need an immediate "fix"

@xhochy
Copy link
Member

xhochy commented Oct 18, 2024

I don't have a plan on how to fix this issue though :(

@hmaarrfk
Copy link
Contributor

You don't think it is similar to conda-forge/jaxlib-feedstock#281 (comment)

@hmaarrfk
Copy link
Contributor

I think actually all you need is:

XLA_FLAGS=--xla_gpu_cuda_data_dir=${CONDA_PREFIX}

so it might just be the same issue where the build_prefix isn't getting replaced.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Oct 18, 2024

I see that @drasmuss opened an issue with as well:
keras-team/tf-keras#62

@drasmuss thank you for the clear reproducer.

In the future, i would specify that you are using conda-forge and not Anaconda. We experiment differently than upstream does expecially with splayed layouts which can cause this problem.

Ok spelunking through the tensorflow code base, I find:

https://github.com/tensorflow/tensorflow/blob/d0ec13c1322e2c0d2584654634cc833541339376/third_party/xla/xla/debug_options_flags.cc#L60

Gonna keep looking to see if there is an official flag we can trigger.

Attempting this fix locally:
diff --git a/recipe/build.sh b/recipe/build.sh
index 409215a..a2aa08e 100644
--- a/recipe/build.sh
+++ b/recipe/build.sh
@@ -148,6 +148,11 @@ if [[ "${target_platform}" == "osx-arm64" ]]; then
   export CXXFLAGS="${CXXFLAGS} -D_LIBCPP_DISABLE_AVAILABILITY"
 fi
 export TF_ENABLE_XLA=1
+
+# We need to tell xla to find things in our prefix, not some other location
+# See https://github.com/conda-forge/tensorflow-feedstock/issues/296#issuecomment-2423371916
+sed -i '\|opts\.set_xla_gpu_cuda_data_dir(|s|^\([[:space:]]*\).*|\1opts.set_xla_gpu_cuda_data_dir("'"${PREFIX}"'"),|' third_party/xla/xla/debug_options_flags.cc
+
 export BUILD_TARGET="//tensorflow/tools/pip_package:wheel //tensorflow/tools/lib_package:libtensorflow //tensorflow:libtensorflow_cc${SHLIB_EXT}"
 
 # Python settings
diff --git a/recipe/meta.yaml b/recipe/meta.yaml
index 4001732..f5924c2 100644
--- a/recipe/meta.yaml
+++ b/recipe/meta.yaml
@@ -1,6 +1,6 @@
 {% set version = "2.17.0" %}
 {% set estimator_version = "2.15.0" %}
-{% set build = 3 %}
+{% set build = 4 %}
 
 {% if cuda_compiler_version != "None" %}
 {% set build = build + 200 %}
fingers crossed

with the sed:

Files containing CONDA_PREFIX
-----------------------------
lib/libtensorflow_cc.so.2.17.0 (binary): Patching
lib/libtensorflow_framework.so.2.17.0 (binary): Patching
include/tensorflow/third_party/pybind11_protobuf/0001-Add-Python-include-path.patch (text): Patching
include/tensorflow/third_party/xla/xla/debug_options_flags.cc (text): Patching

on main:

2024-10-17T05:56:12.9274630Z Files containing CONDA_PREFIX
2024-10-17T05:56:12.9275854Z -----------------------------
2024-10-17T05:56:12.9276923Z lib/libtensorflow_cc.so.2.17.0 (binary): Patching
2024-10-17T05:56:12.9278351Z lib/libtensorflow_framework.so.2.17.0 (binary): Patching
2024-10-17T05:56:12.9281031Z include/tensorflow/third_party/pybind11_protobuf/0001-Add-Python-include-path.patch (text): Patching

hmm that causes things to crash... I can "fix" the crash by specifying XLA_FLAGS=--xla_gpu_cuda_data_dir=${CONDA_PREFIX}....

@hmaarrfk
Copy link
Contributor

Alright, new patching candidate:

On master (today): https://github.com/tensorflow/tensorflow/blob/d0ec13c1322e2c0d2584654634cc833541339376/third_party/xla/third_party/tsl/tsl/platform/default/cuda_root_path.cc#L59

On v2.17.0: https://github.com/tensorflow/tensorflow/blob/v2.17.0/third_party/xla/third_party/tsl/tsl/platform/default/cuda_libdevice_path.cc#L41

The variable TF_CUDA_TOOLKIT_PATH appears in 3 locations:

third_party/gpus/cuda/cuda_config.h.tpl
29:#define TF_CUDA_TOOLKIT_PATH "%{cuda_toolkit_path}"

third_party/xla/third_party/tsl/third_party/gpus/cuda/cuda_config.h.tpl
29:#define TF_CUDA_TOOLKIT_PATH "%{cuda_toolkit_path}"

third_party/xla/third_party/tsl/tsl/platform/default/cuda_libdevice_path.cc
41:  auto roots = std::vector<std::string>{TF_CUDA_TOOLKIT_PATH,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants