Specify additional steps to utilize GPU for Linux users #2299

sgkouzias · 2024-04-08T11:28:15Z

Specify additional steps to utilize GPU for Linux users

Advice to skip additional step 6 if using CPU.

8bitmp3 · 2024-04-09T15:59:21Z

Added second option to create virtual env via Python's built in venv module for Linux users with CUDA-enabled GPUs

Added virtual envs activation/deactivation commands and changed wording for editing the deactivate block in the activate script of the venv virtual env.

Added instructions to resolve the ptxas issue.

Revised CUDNN_DIR definition

Corrected LD_LIBRARY_PATH definition in conda environment instructions

Rename environment variable to PTXAS_DIR and package manager options.

Added note to use pip instead of conda to install TensorFlow.

sgkouzias

Added steps and respective instructions to install TensorFlow by running the pip install tensorflow[and-cuda] command within a virtual environment (option 1: conda, option 2: venv) and set environment variables to find/locate compatible NVIDIA libs installed with TensorFlow to effectively utilize GPUs. The solution has been successfully tested.

Reference: tensorflow/tensorflow#63362

sgkouzias · 2024-05-10T11:06:22Z

@haifeng-jin , @MarkDaoust, @8bitmp3 I await any suggestions or revisions if needed. Do we have any updates?

haifeng-jin · 2024-05-20T18:23:44Z

As I remembered, the current recommended way to install TF is to use pip. I do not have further info on this. @MarkDaoust may comment on this.

sgkouzias · 2024-05-20T18:34:20Z

As I remembered, the current recommended way to install TF is to use pip. I do not have further info on this. @MarkDaoust may comment on this.

@haifeng-jin it seems practically impossible for someone owning a PC with CUDA-enabled GPU to perform deep learning experiments with TensorFlow version 2.16.1 and utilize his GPU locally without manually performing some extra steps not included (until today) in the official TensorFlow documentation of the standard installation procedure of TensorFlow for Linux users with GPUs at least as a temporal fix!

It turns out that when you pip install tensorflow[and-cuda] all required NVIDIA libraries are installed as well. You just need to configure manually the environment variables as appropriate in order to utilize them and run TensorFlow with GPU.

mihaimaruseac

Please don't use "add file"/"update file"/"fix file"/etc. commit messages. These are hard to reason about when looking at the history of the file/repository. Instead, please write explanatory git commit messages.

The commit message is also the title of the PR if the PR has only one commit. It is thus twice important to have commit messages that are relevant, as PRs would be easier to understand and easier to analyze in search results.

For how to write good quality git commit messages, please consult https://cbea.ms/git-commit/

mihaimaruseac · 2024-05-23T21:58:24Z

It turns out that when you pip install tensorflow[and-cuda] all required NVIDIA libraries are installed as well. You just need to configure manually the environment variables as appropriate in order to utilize them and run TensorFlow with GPU.

Can we instead add these to the install guide?

sgkouzias · 2024-05-24T13:11:21Z

configure manually the environment variables as appropriate

@mihaimaruseac shouldn't we explain/specify how to configure manually the environment variables as appropriate?

mihaimaruseac

I read the update and it seems reasonable to me. Thank you

Removed option to install within conda virtual environment. Recommendation to install in venv environment.

sgkouzias · 2024-06-17T19:37:59Z

@t-kalinowski thank you very much for your valuable advice. I revised the PR accordingly.

t-kalinowski · 2024-06-17T20:20:13Z

@sgkouzias if you also create a symlink at my-venv/bin/ptxas -> my-venv/lib/python.../site-packages/.../bin/ptxax, then you could probably get away without needing to require users to modify default activate and deactivate scripts.

Replaced instructions to modify default activate/deactivate scripts with instructions to create symlinks to NVIDIA shared libraries and ptxas.

sgkouzias · 2024-06-18T16:16:54Z

@sgkouzias if you also create a symlink at my-venv/bin/ptxas -> my-venv/lib/python.../site-packages/.../bin/ptxax, then you could probably get away without needing to require users to modify default activate and deactivate scripts.

@t-kalinowski thank you so much for your advice. Instructions have been totally revised as per your comments. Modifications to default activate and deactivate scripts are not required from users. Instructions should resemble more or less what you do in the R interface.

sgkouzias · 2024-06-19T16:23:51Z

@8bitmp3 , @haifeng-jin , @MarkDaoust even TensorFlow version 2.17.0.rc0 requires to specify additional steps to utilize GPU for Linux users. The suggested instructions of this pull request offer a tested solution. I await your comments.

learning-to-play · 2024-06-19T17:55:55Z

site/en/install/pip.md

+
+    ```bash
+    source tf/bin/activate
+    deactivate


Can you remove deactivate?

Can you remove deactivate?

@learning-to-play removed deactivate as advised. Furthermore, I could remove the instruction to create symlink to ptxas since it is ultimately not needed for TensorFlow version 2.17.0.rc0 but only for TensorFlow version 2.16.1. Awaiting your comments.

I want to make sure that I understand the situation correctly. Which of the following two situation is correct?

If the issue doesn't happen for 2.17.0RC0, yes please remove the instructions.

If the issue happens for both 2.17.0RC0 and 2.16, we can wait for the GPU team to take a look at TF 2.17.0 RC0 Fails to work with GPUs (and TF 2.16 too) tensorflow#63362 and see if the can send a fix for both 2.16.2 and 2.17.0 release.

@learning-to-play the only difference is that on version 2.17.0.rc0 you need to create the symlinks to NVIDIA libs in order to utilize GPUs while on version 2.16.1 you should in addition to creating symlinks to NVIDIA libs create a symlink to ptxas as well. Consequently, the command pip install tensorflow[and-cuda] alone fails to work with GPUs on both versions.

sgkouzias · 2024-07-01T11:32:21Z

@learning-to-play, @SeeForTwo, @8bitmp3, @haifeng-jin, @MarkDaoust, @markmcd

Unfortunately the latest release namely TensorFlow 2.16.2 does not fix the ptxas bug. When running a training script I get the error:

ptxas returned an error during compilation of ptx to sass: 'INTERNAL: ptxas 12.3.103 has a bug that we think can affect XLA. Please use a different version.' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.
Aborted (core dumped)

So it seems as TensorFlow 2.16.2 Fails to work with GPUs as well !

Notes:

Successful installation was verified by running:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
The solution included in the submitted pull request pending review helped to get rid of the ptxas bug and ultimately enforced TensorFlow 2.16.2 to work with my GPU:

ln -sf $(find $(dirname $(dirname $(python -c "import nvidia.cuda_nvcc; print(nvidia.cuda_nvcc.__file__)"))/*/bin/) -name ptxas -print -quit) $VIRTUAL_ENV/bin/ptxas

belitskiy · 2024-07-02T13:10:12Z

Thank you for the contribution, @sgkouzias :)
Given that the [and-cuda] installation now does detect pip-installed CUDA components again, please add a disclaimer specify that that symbolic links are only necessary in case the intended way doesn't work, i.e. the components aren't being detected, and/or conflict with the existing system CUDA installation (like ptxas for you).

Revised the step with instructions to configure the virtual environment variables for GPU users by adding a disclaimer.

sgkouzias · 2024-07-02T16:48:14Z

Thank you for the contribution, @sgkouzias :) Given that the [and-cuda] installation now does detect pip-installed CUDA components again, please add a disclaimer specify that that symbolic links are only necessary in case the intended way doesn't work, i.e. the components aren't being detected, and/or conflict with the existing system CUDA installation (like ptxas for you).

@belitskiy, @learning-to-play I revised instructions as advised and will be awaiting your feedback. It is my honor to contribute to the TensorFlow community.

Deleted asterisk emoji and placed disclaimer note before respective instructions.

8bitmp3

@MarkDaoust @markmcd PTAL

site/en/install/pip.md

MarkDaoust · 2024-09-04T21:41:02Z

Thanks for all your work everyone (especially @sgkouzias)!

I just tweaked the order so that this new GPU debugging step is after the step where you test the GPU.

I think this is still right so I'm merging it. But LMK if I misunderstood anything.

sgkouzias · 2024-09-05T16:36:35Z

Thanks for all your work everyone (especially @sgkouzias)!

I just tweaked the order so that this new GPU debugging step is after the step where you test the GPU.

I think this is still right so I'm merging it. But LMK if I misunderstood anything.

Thank you @MarkDaoust 🙏 it is my honour.
I noticed you mentioned merging, but it seems the pull request still needs a formal review due to branch protection rules. Could you please take a quick look and approve it when you have a chance?
Many thanks again!

MarkDaoust · 2024-09-05T17:19:34Z

Really it has everything it needs we're just waiting for the internal merge, it should be through soon.

Update pip.md

c494be5

Specify additional steps to utilize GPU for Linux users

sgkouzias requested review from haifeng-jin, MarkDaoust and 8bitmp3 as code owners April 8, 2024 11:28

Update pip.md

40824e4

Advice to skip additional step 6 if using CPU.

8bitmp3 assigned markmcd Apr 9, 2024

sgkouzias added 5 commits April 9, 2024 21:58

Update pip.md

840fec9

Added second option to create virtual env via Python's built in venv module for Linux users with CUDA-enabled GPUs

Update pip.md

5448363

Added virtual envs activation/deactivation commands and changed wording for editing the deactivate block in the activate script of the venv virtual env.

Update pip.md

c7518a2

Added instructions to resolve the ptxas issue.

Update pip.md

6a40fe4

Revised CUDNN_DIR definition

Update pip.md

82713bf

Corrected LD_LIBRARY_PATH definition in conda environment instructions

8bitmp3 assigned MarkDaoust Apr 11, 2024

sgkouzias added 2 commits April 12, 2024 10:04

Update pip.md

b81e4f2

Rename environment variable to PTXAS_DIR and package manager options.

Update pip.md

aebf305

Added note to use pip instead of conda to install TensorFlow.

sgkouzias commented Apr 13, 2024

View reviewed changes

sgkouzias marked this pull request as draft May 16, 2024 13:23

sgkouzias marked this pull request as ready for review May 16, 2024 13:28

mihaimaruseac reviewed May 23, 2024

View reviewed changes

sgkouzias changed the title ~~Update pip.md~~ Specify additional steps to utilize GPU for Linux users May 24, 2024

sgkouzias requested a review from mihaimaruseac May 24, 2024 13:12

mihaimaruseac approved these changes May 24, 2024

View reviewed changes

Simplify procedure by removing option to install with conda virtual env.

146bbeb

Removed option to install within conda virtual environment. Recommendation to install in venv environment.

Instructions to create symlinks to NVIDIA shared libraries and ptxas.

7cf1c57

Replaced instructions to modify default activate/deactivate scripts with instructions to create symlinks to NVIDIA shared libraries and ptxas.

learning-to-play reviewed Jun 19, 2024

View reviewed changes

Removed deactivate command.

3046cdc

learning-to-play requested a review from SeeForTwo June 19, 2024 18:45

Instructions to create symlinks in case the intended way doesn't work.

7f5cce6

Revised the step with instructions to configure the virtual environment variables for GPU users by adding a disclaimer.

sgkouzias requested a review from learning-to-play July 2, 2024 16:48

Reformat disclaimer on environment configuration step.

64e7c50

Deleted asterisk emoji and placed disclaimer note before respective instructions.

8bitmp3 previously approved these changes Sep 4, 2024

View reviewed changes

8bitmp3 self-assigned this Sep 4, 2024

8bitmp3 added the awaiting-technical-review label Sep 4, 2024

MarkDaoust reviewed Sep 4, 2024

View reviewed changes

site/en/install/pip.md Outdated Show resolved Hide resolved

Move the GPU debugging step to after the "Test the installation" step.

06798ba

MarkDaoust dismissed 8bitmp3’s stale review via 06798ba September 4, 2024 21:39

MarkDaoust added the ready to pull Start merge process label Sep 4, 2024

mihaimaruseac approved these changes Sep 5, 2024

View reviewed changes

sgkouzias requested a review from MarkDaoust September 5, 2024 16:44

MarkDaoust approved these changes Sep 5, 2024

View reviewed changes

copybara-service bot merged commit 27ba8a4 into tensorflow:master Sep 5, 2024
5 checks passed

t-kalinowski mentioned this pull request Sep 17, 2024

Automatically Set Up Python Environments for R Packages rstudio/reticulate#1671

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify additional steps to utilize GPU for Linux users #2299

Specify additional steps to utilize GPU for Linux users #2299

sgkouzias commented Apr 8, 2024

8bitmp3 commented Apr 9, 2024

sgkouzias left a comment •

edited

Loading

sgkouzias commented May 10, 2024 •

edited

Loading

haifeng-jin commented May 20, 2024

sgkouzias commented May 20, 2024 •

edited

Loading

mihaimaruseac left a comment

mihaimaruseac commented May 23, 2024

sgkouzias commented May 24, 2024

mihaimaruseac left a comment

sgkouzias commented Jun 17, 2024

t-kalinowski commented Jun 17, 2024

sgkouzias commented Jun 18, 2024

sgkouzias commented Jun 19, 2024 •

edited

Loading

learning-to-play Jun 19, 2024

sgkouzias Jun 19, 2024

learning-to-play Jun 19, 2024

sgkouzias Jun 19, 2024 •

edited

Loading

sgkouzias commented Jul 1, 2024 •

edited

Loading

belitskiy commented Jul 2, 2024

sgkouzias commented Jul 2, 2024

8bitmp3 left a comment

MarkDaoust commented Sep 4, 2024

sgkouzias commented Sep 5, 2024

MarkDaoust commented Sep 5, 2024

Specify additional steps to utilize GPU for Linux users #2299

Specify additional steps to utilize GPU for Linux users #2299

Conversation

sgkouzias commented Apr 8, 2024

8bitmp3 commented Apr 9, 2024

sgkouzias left a comment • edited Loading

Choose a reason for hiding this comment

sgkouzias commented May 10, 2024 • edited Loading

haifeng-jin commented May 20, 2024

sgkouzias commented May 20, 2024 • edited Loading

mihaimaruseac left a comment

Choose a reason for hiding this comment

mihaimaruseac commented May 23, 2024

sgkouzias commented May 24, 2024

mihaimaruseac left a comment

Choose a reason for hiding this comment

sgkouzias commented Jun 17, 2024

t-kalinowski commented Jun 17, 2024

sgkouzias commented Jun 18, 2024

sgkouzias commented Jun 19, 2024 • edited Loading

learning-to-play Jun 19, 2024

Choose a reason for hiding this comment

sgkouzias Jun 19, 2024

Choose a reason for hiding this comment

learning-to-play Jun 19, 2024

Choose a reason for hiding this comment

sgkouzias Jun 19, 2024 • edited Loading

Choose a reason for hiding this comment

sgkouzias commented Jul 1, 2024 • edited Loading

belitskiy commented Jul 2, 2024

sgkouzias commented Jul 2, 2024

8bitmp3 left a comment

Choose a reason for hiding this comment

MarkDaoust commented Sep 4, 2024

sgkouzias commented Sep 5, 2024

MarkDaoust commented Sep 5, 2024

sgkouzias left a comment •

edited

Loading

sgkouzias commented May 10, 2024 •

edited

Loading

sgkouzias commented May 20, 2024 •

edited

Loading

sgkouzias commented Jun 19, 2024 •

edited

Loading

sgkouzias Jun 19, 2024 •

edited

Loading

sgkouzias commented Jul 1, 2024 •

edited

Loading