Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which version can I use in cuda12.1 and cuDNN 8.9 #22198

Closed
JohnHerry opened this issue Sep 24, 2024 · 5 comments
Closed

Which version can I use in cuda12.1 and cuDNN 8.9 #22198

JohnHerry opened this issue Sep 24, 2024 · 5 comments
Labels
stale issues that have not been addressed in a while; categorized by a bot

Comments

@JohnHerry
Copy link

JohnHerry commented Sep 24, 2024

Describe the issue

In my env the default CUDA==12.1 and cuDNN==8.9; When I install the torch, there is a relative nvidia-cudnn-cu12==8.9.26 installed for my env.
When I install the onnxruntime-gpu==1.19.2, create onnxruntime InferenceSession failed. it will report lost of libcudnn.so.9, which needs cuDNN >=9
When I install the onnxruntime-gpu==1.17.1 --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
it will report to need something like initialization error, cuda_execution_provider.cc, file not found "libcublas.so.11" which seems along with cuda11.X.
So which version of the package can I install for use in my env?

To reproduce

as descriped above

Urgency

No response

Platform

Linux

OS Version

Ubuntu22.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.19.2 OR 1.17.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA12.1

@JohnHerry
Copy link
Author

18.0

Thank you for the help. I am using the onnxruntime-gpu to make online training, which means for the model I am training, all its input tokens are extracted by the onnxruntime-gpu loaded pretrain module. the feature extraction, and my target model traning are all on GPU cards.
I had tried onnxruntime-gpu==1.18.0, which cause the same crash like with 1.17.1.

@JohnHerry
Copy link
Author

When we install pytorch, it will install all nvidia-cuda* nvidia-cudnn* nvidia-cublas* nvidia-nccl* ..... automaticly. Why the the install of this onnxruntime module can not check the at hand nvidia packages version and select to install a corresponding package?

@tianleiwu
Copy link
Contributor

tianleiwu commented Sep 26, 2024

For CUDA and cuDNN version, you can look up here:
https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements

Then you can install according to this: https://onnxruntime.ai/docs/install/#install-onnx-runtime-ort-1

Summary of recommended combinations:

CUDA cuDNN PyTorch onnxruntime-gpu Install command
12.x 9.x 2.4.1 1.19.2 pip install onnxruntime-gpu==1.19.2
12.x 8.9 2.3.1 1.18.0 pip install onnxruntime-gpu==1.18.0 --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
11.8 8.9 2.3.1 1.19.2 pip install onnxruntime-gpu==1.19.2 --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-11/pypi/simple/

Actually, onnxruntime-gpu can use the cuDNN DLLs that installed by nvidia-cudnn-cu1* package. You will need add its lib directory to LD_LIBRARY_PATH.

Since you have installed CUDA 12.1 in your machine (Assume that cuda is installed in /usr/local/cuda/), here is an example that install the first option (cuda 12.x, cudnn 9.x, torch 2.4.1, onnxruntime-gpu 1.19.2) in a new conda environment:

conda create -n py311 python=3.11
conda activate py311
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 --force-reinstall
pip install onnxruntime-gpu nvidia-cudnn-cu12
export site_packages="$(python3 -c "import sysconfig; print(sysconfig.get_path('purelib'))")"
export LD_LIBRARY_PATH="$site_packages/nvidia/cudnn/lib:/usr/local/cuda/lib64:$LD_LIBRARY_PATH"
python my_training.py

Copy link
Contributor

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

3 participants
@JohnHerry @tianleiwu and others