New optimizers fail to load CUDA installed through conda #62

drasmuss · 2023-01-13T16:54:35Z

System information.

Have I written custom code (as opposed to using a stock example script provided in Keras): no
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04 (WSL)
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): 2.11
Python version: 3.9
Bazel version (if compiling from source): N/A
GPU model and memory: RTX 2080 Ti
Exact command to reproduce:

Create a new environment, following the official installation instructions from here https://www.tensorflow.org/install/pip#linux:

conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
pip install tensorflow

Run the beginner MNIST tutorial (or any other tutorial that calls fit) from here https://keras.io/examples/vision/mnist_convnet/

Describe the problem.

An error is raised:

libdevice not found at ./libdevice.10.bc

Note that if you switch to using the legacy optimizers, by switching this line

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

to this

model.compile(loss="categorical_crossentropy", optimizer=keras.optimizers.legacy.Adam(), metrics=["accuracy"])

then the example runs successfully.

Describe the current behavior.

An error occurs when running the example.

Describe the expected behavior.

The example should run without error, as it does when using the legacy optimizers.

Do you want to contribute a PR? (yes/no): no

Standalone code to reproduce the issue.

https://keras.io/examples/vision/mnist_convnet/

Source code / logs.

Full stack trace of the error:

    File ".../tmp.py", line 47, in <module>
      model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/engine/training.py", line 1650, in fit
      tmp_logs = self.train_function(iterator)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/engine/training.py", line 1249, in train_function
      return step_function(self, iterator)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/engine/training.py", line 1233, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/engine/training.py", line 1222, in run_step
      outputs = model.train_step(data)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/engine/training.py", line 1027, in train_step
      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
      self.apply_gradients(grads_and_vars)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
      return super().apply_gradients(grads_and_vars, name=name)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients
      iteration = self._internal_apply_gradients(grads_and_vars)
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
      return tf.__internal__.distribute.interim.maybe_merge_call(
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
      distribution.extended.update(
    File "/home/drasmuss/mambaforge/envs/tmp2/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
      return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'StatefulPartitionedCall_4'
libdevice not found at ./libdevice.10.bc
         [[{{node StatefulPartitionedCall_4}}]] [Op:__inference_train_function_1026]

Likely related to:

The text was updated successfully, but these errors were encountered:

tilakrayal · 2023-01-17T09:49:17Z

@gowthamkpr,
I tried to execute the mentioned code in two different ways as below, but couldn't find any issue. Kindly find the gist of it here.
model.compile(loss=tf.losses.mse, optimizer=tf.keras.optimizers.SGD())

and

model.compile(loss=tf.losses.mse, optimizer=tf.keras.optimizers.legacy.SGD())

drasmuss · 2023-01-17T12:07:37Z

It doesn't look like your gist is following step 1 of the reproduction instructions above (i.e., create a new environment and install CUDA through conda).

kevint0 · 2023-01-18T20:43:57Z

I have been encountering the same issue as @drasmuss with the non legacy opitmisers: "adam" and "rmsprop". No errors with the SGD optimizer though. Below is the error from trying to run my script with the "rmsprop" optimiser.

Node: 'StatefulPartitionedCall_8'
libdevice not found at ./libdevice.10.bc
[[{{node StatefulPartitionedCall_8}}]] [Op:__inference_train_function_1102]

mhaas · 2023-02-24T10:19:18Z

Hi,

adding me me too here - hoping it adds value and not just noise :)

I'm also seeing this issue in the following setup:

CUDA 11.7 installed on SLES from RPM packages (via the official Nvidia rep)
cuDNN 8.5.0 installed from cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz
Tensorflow 2.11 installed via pip

This was not an issue with Tensorflow 2.10. With 2.11, I now get:

libdevice not found at ./libdevice.10.bc

SuryanarayanaY · 2023-04-26T15:21:11Z

@drasmuss ,

I believe this is not an issue now. I have cross checked with legacy optimizer and execution is success.Please refer the attached logs below. Please confirm if this is still an issue now?

17422_logs.txt

drasmuss · 2023-04-26T15:50:58Z

Just checked, and it produces the same error as before. Here are the reproduction steps (I updated the installation instructions to match the changes for TF 2.12 here https://www.tensorflow.org/install/pip#linux):

# these are the standard TF installation steps, copied here for clarity
conda install -c conda-forge cudatoolkit=11.8.0
python3 -m pip install nvidia-cudnn-cu11==8.6.0.163 tensorflow==2.12.*
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

# calling model.fit triggers the same error as before
python -c "import tensorflow as tf; model = tf.keras.models.Sequential([tf.keras.layers.Dense(10, input_shape=(10,))]); model.compile(loss=tf.losses.mse); model.fit(tf.ones((32, 10)), tf.ones((32, 10)))"

Here is the full error log:

2023-04-26 12:33:40.887265: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-04-26 12:33:40.912502: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-26 12:33:41.303191: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-04-26 12:33:41.877110: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-26 12:33:41.892418: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-26 12:33:41.892768: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-26 12:33:41.894668: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-26 12:33:41.894937: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-26 12:33:41.895171: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-26 12:33:42.496930: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-26 12:33:42.497198: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-26 12:33:42.497225: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1722] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
2023-04-26 12:33:42.497474: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:982] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-04-26 12:33:42.497531: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 8859 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5
2023-04-26 12:33:43.559322: I tensorflow/compiler/xla/service/service.cc:169] XLA service 0x7fad47d31b00 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-04-26 12:33:43.559369: I tensorflow/compiler/xla/service/service.cc:177]   StreamExecutor device (0): NVIDIA GeForce RTX 2080 Ti, Compute Capability 7.5
2023-04-26 12:33:43.562463: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-04-26 12:33:43.668017: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:424] Loaded cuDNN version 8600
2023-04-26 12:33:43.673970: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:530] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
  ./cuda_sdk_lib
  /usr/local/cuda-11.8
  /usr/local/cuda
  .
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2023-04-26 12:33:43.674114: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:274] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-04-26 12:33:43.674287: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:362 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-04-26 12:33:43.674323: I tensorflow/core/common_runtime/executor.cc:1197] [/job:localhost/replica:0/task:0/device:GPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INTERNAL: libdevice not found at ./libdevice.10.bc
         [[{{node StatefulPartitionedCall_1}}]]
2023-04-26 12:33:43.682794: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:274] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-04-26 12:33:43.682947: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:362 : INTERNAL: libdevice not found at ./libdevice.10.bc
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File ".../lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File ".../lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Graph execution error:

Detected at node 'StatefulPartitionedCall_1' defined at (most recent call last):
    File "<string>", line 1, in <module>
    File ".../lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File ".../lib/python3.9/site-packages/keras/engine/training.py", line 1685, in fit
      tmp_logs = self.train_function(iterator)
    File ".../lib/python3.9/site-packages/keras/engine/training.py", line 1284, in train_function
      return step_function(self, iterator)
    File ".../lib/python3.9/site-packages/keras/engine/training.py", line 1268, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File ".../lib/python3.9/site-packages/keras/engine/training.py", line 1249, in run_step
      outputs = model.train_step(data)
    File ".../lib/python3.9/site-packages/keras/engine/training.py", line 1054, in train_step
      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File ".../lib/python3.9/site-packages/keras/optimizers/optimizer.py", line 543, in minimize
      self.apply_gradients(grads_and_vars)
    File ".../lib/python3.9/site-packages/keras/optimizers/optimizer.py", line 1174, in apply_gradients
      return super().apply_gradients(grads_and_vars, name=name)
    File ".../lib/python3.9/site-packages/keras/optimizers/optimizer.py", line 650, in apply_gradients
      iteration = self._internal_apply_gradients(grads_and_vars)
    File ".../lib/python3.9/site-packages/keras/optimizers/optimizer.py", line 1200, in _internal_apply_gradients
      return tf.__internal__.distribute.interim.maybe_merge_call(
    File ".../lib/python3.9/site-packages/keras/optimizers/optimizer.py", line 1250, in _distributed_apply_gradients_fn
      distribution.extended.update(
    File ".../lib/python3.9/site-packages/keras/optimizers/optimizer.py", line 1245, in apply_grad_to_update_var
      return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'StatefulPartitionedCall_1'
libdevice not found at ./libdevice.10.bc
         [[{{node StatefulPartitionedCall_1}}]] [Op:__inference_train_function_401]

It's possible that you have CUDA installed elsewhere on your system (not through conda), and tensorflow is finding libdevice in that installation.

SuryanarayanaY · 2023-05-09T04:30:40Z

Hi @drasmuss ,

Could you please try the following commands and let us know whether it fixes the error.

# Install NVCC
conda install -c nvidia cuda-nvcc=11.3.58
# Configure the XLA cuda directory
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
printf 'export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX/lib/\n' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
# Copy libdevice file to the required path
mkdir -p $CONDA_PREFIX/lib/nvvm/libdevice
cp $CONDA_PREFIX/lib/libdevice.10.bc $CONDA_PREFIX/lib/nvvm/libdevice/

Thanks!

drasmuss · 2023-05-09T12:39:43Z

Yes, that makes the problem go away, although I would hesitate to call it a solution as that's quite a cumbersome process to repeat every time we create a new environment, and a definite downgrade in user experience compared to TF <= 2.10.

rchao · 2023-05-18T09:16:04Z

@chenmoneygithub do you know if this is a known issue?

danieljwiest · 2023-09-15T14:23:16Z

I ran into this same issue with WSL2 and the proposed fix did not initially work for me; however, it did eventually work if I rebooted my computer after updating the $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh file.

I had similar issues with the published pip install instructions for Tensorflow (https://www.tensorflow.org/install/pip#step-by-step_instructions) and had to reboot my system between several of the steps.

Hopefully this helps anyone running into this issue on WSL2

joaomamede · 2023-10-25T05:12:57Z

I had this problem as well and the fix above worked.

Datagniel · 2023-10-26T13:41:46Z

I ran into the same problem on Linux Mint victoria 21.2 x86_64 after creating a new environment with conda and installing tensorflow-gpu version 2.12.1 from the conda-forge channel.
As suggested by @SuryanarayanaY I used his attempt but without specifying the cuda-ncc version (it installed 12.3.52) and it worked. Thank you very much again for the solution, @SuryanarayanaY !

conda install -c nvidia cuda-nvcc
# Configure the XLA cuda directory
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
printf 'export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX/lib/\n' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
# Copy libdevice file to the required path
mkdir -p $CONDA_PREFIX/lib/nvvm/libdevice
cp $CONDA_PREFIX/lib/libdevice.10.bc $CONDA_PREFIX/lib/nvvm/libdevice/

LogExE · 2023-11-12T13:15:20Z

Same issue on Fedora 39 with fresh tensorflow from conda-forge, SuryanarayanaY fix works

makra89 · 2024-01-30T11:27:39Z

Same issue, any official fix yet?

google-ml-butler bot assigned tilakrayal Jan 13, 2023

drasmuss mentioned this issue Jan 13, 2023

Cannot find libdevice in TF 2.11 + compilation fails without ptxas conda-forge/tensorflow-feedstock#296

Open

1 task

tilakrayal added the type:bug label Jan 17, 2023

tilakrayal assigned gowthamkpr and unassigned tilakrayal Jan 17, 2023

SuryanarayanaY self-assigned this Apr 26, 2023

SuryanarayanaY added the stat:awaiting response from contributor label Apr 26, 2023

google-ml-butler bot removed the stat:awaiting response from contributor label Apr 26, 2023

drasmuss mentioned this issue May 4, 2023

Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice tensorflow/tensorflow#56927

Open

SuryanarayanaY added the stat:awaiting response from contributor label May 9, 2023

google-ml-butler bot removed the stat:awaiting response from contributor label May 9, 2023

SuryanarayanaY added the keras-team-review-pending label May 18, 2023

rchao removed the keras-team-review-pending label May 18, 2023

christopherwoodall mentioned this issue Jul 26, 2023

Update tensorflow conda environment twosixlabs/armory#1968

Merged

sachinprasadhs transferred this issue from keras-team/keras Sep 22, 2023

MartijnVanbiervliet mentioned this issue Feb 1, 2024

GPU inference in Docker container fails due to missing libdevice directory tensorflow/serving#2201

Closed

nadamoukaddem mentioned this issue Mar 25, 2024

Update to recent versions of TensorFlow CosmoStat/wf-psf#90

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New optimizers fail to load CUDA installed through conda #62

New optimizers fail to load CUDA installed through conda #62

drasmuss commented Jan 13, 2023

tilakrayal commented Jan 17, 2023

drasmuss commented Jan 17, 2023

kevint0 commented Jan 18, 2023

mhaas commented Feb 24, 2023

SuryanarayanaY commented Apr 26, 2023

drasmuss commented Apr 26, 2023

SuryanarayanaY commented May 9, 2023

drasmuss commented May 9, 2023

rchao commented May 18, 2023

danieljwiest commented Sep 15, 2023

joaomamede commented Oct 25, 2023

Datagniel commented Oct 26, 2023 •

edited

Loading

LogExE commented Nov 12, 2023

makra89 commented Jan 30, 2024

New optimizers fail to load CUDA installed through conda #62

New optimizers fail to load CUDA installed through conda #62

Comments

drasmuss commented Jan 13, 2023

tilakrayal commented Jan 17, 2023

drasmuss commented Jan 17, 2023

kevint0 commented Jan 18, 2023

mhaas commented Feb 24, 2023

SuryanarayanaY commented Apr 26, 2023

drasmuss commented Apr 26, 2023

SuryanarayanaY commented May 9, 2023

drasmuss commented May 9, 2023

rchao commented May 18, 2023

danieljwiest commented Sep 15, 2023

joaomamede commented Oct 25, 2023

Datagniel commented Oct 26, 2023 • edited Loading

LogExE commented Nov 12, 2023

makra89 commented Jan 30, 2024

Datagniel commented Oct 26, 2023 •

edited

Loading