-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify additional steps to utilize GPU for Linux users #2299
Conversation
Specify additional steps to utilize GPU for Linux users
Advice to skip additional step 6 if using CPU.
Added second option to create virtual env via Python's built in venv module for Linux users with CUDA-enabled GPUs
Added virtual envs activation/deactivation commands and changed wording for editing the deactivate block in the activate script of the venv virtual env.
Added instructions to resolve the ptxas issue.
Revised CUDNN_DIR definition
Corrected LD_LIBRARY_PATH definition in conda environment instructions
Rename environment variable to PTXAS_DIR and package manager options.
Added note to use pip instead of conda to install TensorFlow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added steps and respective instructions to install TensorFlow by running the pip install tensorflow[and-cuda] command within a virtual environment (option 1: conda, option 2: venv) and set environment variables to find/locate compatible NVIDIA libs installed with TensorFlow to effectively utilize GPUs. The solution has been successfully tested.
Reference: tensorflow/tensorflow#63362
@haifeng-jin , @MarkDaoust, @8bitmp3 I await any suggestions or revisions if needed. Do we have any updates? |
As I remembered, the current recommended way to install TF is to use |
@haifeng-jin it seems practically impossible for someone owning a PC with CUDA-enabled GPU to perform deep learning experiments with TensorFlow version 2.16.1 and utilize his GPU locally without manually performing some extra steps not included (until today) in the official TensorFlow documentation of the standard installation procedure of TensorFlow for Linux users with GPUs at least as a temporal fix! It turns out that when you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use "add file"/"update file"/"fix file"/etc. commit messages. These are hard to reason about when looking at the history of the file/repository. Instead, please write explanatory git commit messages.
The commit message is also the title of the PR if the PR has only one commit. It is thus twice important to have commit messages that are relevant, as PRs would be easier to understand and easier to analyze in search results.
For how to write good quality git commit messages, please consult https://cbea.ms/git-commit/
Can we instead add these to the install guide? |
@mihaimaruseac shouldn't we explain/specify how to configure manually the environment variables as appropriate? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the update and it seems reasonable to me. Thank you
Removed option to install within conda virtual environment. Recommendation to install in venv environment.
@t-kalinowski thank you very much for your valuable advice. I revised the PR accordingly. |
@sgkouzias if you also create a symlink at |
Replaced instructions to modify default activate/deactivate scripts with instructions to create symlinks to NVIDIA shared libraries and ptxas.
@t-kalinowski thank you so much for your advice. Instructions have been totally revised as per your comments. Modifications to default |
@8bitmp3 , @haifeng-jin , @MarkDaoust even TensorFlow version |
site/en/install/pip.md
Outdated
|
||
```bash | ||
source tf/bin/activate | ||
deactivate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove deactivate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove
deactivate
?
@learning-to-play removed deactivate
as advised. Furthermore, I could remove the instruction to create symlink to ptxas since it is ultimately not needed for TensorFlow version 2.17.0.rc0
but only for TensorFlow version 2.16.1
. Awaiting your comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to make sure that I understand the situation correctly. Which of the following two situation is correct?
- If the issue doesn't happen for 2.17.0RC0, yes please remove the instructions.
- If the issue happens for both 2.17.0RC0 and 2.16, we can wait for the GPU team to take a look at TF 2.17.0 RC0 Fails to work with GPUs (and TF 2.16 too) tensorflow#63362 and see if the can send a fix for both 2.16.2 and 2.17.0 release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@learning-to-play the only difference is that on version 2.17.0.rc0
you need to create the symlinks to NVIDIA libs in order to utilize GPUs while on version 2.16.1
you should in addition to creating symlinks to NVIDIA libs create a symlink to ptxas as well. Consequently, the command pip install tensorflow[and-cuda]
alone fails to work with GPUs on both versions.
@learning-to-play, @SeeForTwo, @8bitmp3, @haifeng-jin, @MarkDaoust, @markmcd Unfortunately the latest release namely TensorFlow
So it seems as TensorFlow Notes:
ln -sf $(find $(dirname $(dirname $(python -c "import nvidia.cuda_nvcc; print(nvidia.cuda_nvcc.__file__)"))/*/bin/) -name ptxas -print -quit) $VIRTUAL_ENV/bin/ptxas |
Thank you for the contribution, @sgkouzias :) |
Revised the step with instructions to configure the virtual environment variables for GPU users by adding a disclaimer.
@belitskiy, @learning-to-play I revised instructions as advised and will be awaiting your feedback. It is my honor to contribute to the TensorFlow community. |
Deleted asterisk emoji and placed disclaimer note before respective instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarkDaoust @markmcd PTAL
Thanks for all your work everyone (especially @sgkouzias)! I just tweaked the order so that this new GPU debugging step is after the step where you test the GPU. I think this is still right so I'm merging it. But LMK if I misunderstood anything. |
Thank you @MarkDaoust 🙏 it is my honour. |
Really it has everything it needs we're just waiting for the internal merge, it should be through soon. |
Specify additional steps to utilize GPU for Linux users