-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New optimizers fail to load CUDA installed through conda #62
Comments
@gowthamkpr, and
|
It doesn't look like your gist is following step 1 of the reproduction instructions above (i.e., create a new environment and install CUDA through conda). |
I have been encountering the same issue as @drasmuss with the non legacy opitmisers: "adam" and "rmsprop". No errors with the SGD optimizer though. Below is the error from trying to run my script with the "rmsprop" optimiser. Node: 'StatefulPartitionedCall_8' |
Hi, adding me I'm also seeing this issue in the following setup:
This was not an issue with Tensorflow 2.10. With 2.11, I now get:
|
I believe this is not an issue now. I have cross checked with legacy optimizer and execution is success.Please refer the attached logs below. Please confirm if this is still an issue now? |
Just checked, and it produces the same error as before. Here are the reproduction steps (I updated the installation instructions to match the changes for TF 2.12 here https://www.tensorflow.org/install/pip#linux): # these are the standard TF installation steps, copied here for clarity
conda install -c conda-forge cudatoolkit=11.8.0
python3 -m pip install nvidia-cudnn-cu11==8.6.0.163 tensorflow==2.12.*
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
# calling model.fit triggers the same error as before
python -c "import tensorflow as tf; model = tf.keras.models.Sequential([tf.keras.layers.Dense(10, input_shape=(10,))]); model.compile(loss=tf.losses.mse); model.fit(tf.ones((32, 10)), tf.ones((32, 10)))" Here is the full error log:
It's possible that you have CUDA installed elsewhere on your system (not through conda), and tensorflow is finding |
Hi @drasmuss , Could you please try the following commands and let us know whether it fixes the error.
Thanks! |
Yes, that makes the problem go away, although I would hesitate to call it a solution as that's quite a cumbersome process to repeat every time we create a new environment, and a definite downgrade in user experience compared to TF <= 2.10. |
@chenmoneygithub do you know if this is a known issue? |
I ran into this same issue with WSL2 and the proposed fix did not initially work for me; however, it did eventually work if I rebooted my computer after updating the I had similar issues with the published pip install instructions for Tensorflow (https://www.tensorflow.org/install/pip#step-by-step_instructions) and had to reboot my system between several of the steps. Hopefully this helps anyone running into this issue on WSL2 |
I had this problem as well and the fix above worked. |
I ran into the same problem on Linux Mint victoria 21.2 x86_64 after creating a new environment with conda and installing tensorflow-gpu version 2.12.1 from the conda-forge channel.
|
Same issue on Fedora 39 with fresh tensorflow from conda-forge, SuryanarayanaY fix works |
Same issue, any official fix yet? |
System information.
fit
) from here https://keras.io/examples/vision/mnist_convnet/Describe the problem.
An error is raised:
Note that if you switch to using the legacy optimizers, by switching this line
to this
then the example runs successfully.
Describe the current behavior.
An error occurs when running the example.
Describe the expected behavior.
The example should run without error, as it does when using the legacy optimizers.
Standalone code to reproduce the issue.
https://keras.io/examples/vision/mnist_convnet/
Source code / logs.
Full stack trace of the error:
Likely related to:
The text was updated successfully, but these errors were encountered: