Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linking of cudart #579

Open
gfardell opened this issue Sep 9, 2024 · 6 comments
Open

Linking of cudart #579

gfardell opened this issue Sep 9, 2024 · 6 comments

Comments

@gfardell
Copy link
Contributor

gfardell commented Sep 9, 2024

As you know, for use with CIL we build tigre with conda and host under the ccpi channel

The environment we build in to contains the right version of the cudart redistributable shared library, and this should be automatically found and linked when running within the virtual environment.

However we have an issue where users need to install the cuda sdk installed in order to run tigre.

Commenting out these lines:

if hasattr(os, "add_dll_directory"):
# Add all the DLL directories manually
# see:
# https://docs.python.org/3.8/whatsnew/3.8.html#bpo-36085-whatsnew
# https://stackoverflow.com/a/60803169/19344391
dll_directory = os.path.dirname(__file__)
os.add_dll_directory(dll_directory)
# The user must install the CUDA Toolkit
cuda_bin = os.path.join(os.environ["CUDA_PATH"], "bin")
os.add_dll_directory(cuda_bin)

The correct version of cudart is found automatically. With conda it's in somewhere like C:\Users\[USER]\miniforge3\envs\[ENV]\Library\bin

With the lines it forces it to use the system installation - which if you're building and running on the same system isn't an issue but obviously our aim is easy redistribution and CUDA_HOME is therefore often None. Even if it's not None it may not point to the right CUDA version.

I've read through the linked issues to the code, and still struggle to see why it would be necessary if PATH was set correctly to something like: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin which contains cudart.

Specifications

  • MATLAB/python version: Python 3.10/3.11/3.12
  • OS: win11 (and confirmed on win10)
  • CUDA version: Building with 11.8, but previously had 10.2
@AnderBiguri
Copy link
Member

Heya,

Admittedly, I don't know why this was changed. It always worked for me. I wonder if it for the cases where users install CUDA but not add it to PATH?

Is there maybe an alternative we can add that joins two things? something like if built_via_conda: elif hasattr(os, "add_dll_directory"): that we can do, such that everyone is happy? All this is not much my strength admitedly.

@WYVERN2742
Copy link

Hi, I'm running into this issue with distributable builds of WebCT, the embedded cudatoolkit isn't being correctly used by TIGRE, resulting in a crash on startup on systems that don't previously have the CUDA libraries installed

2024-11-21 11:16:33,520 [INFO] root: Welcome to WebCT 0.1.3
Traceback (most recent call last):
  File "app.py", line 24, in <module>
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "PyInstaller\loader\pyimod02_importers.py", line 419, in exec_module
  File "webct\__init__.py", line 223, in <module>
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "PyInstaller\loader\pyimod02_importers.py", line 419, in exec_module
  File "webct\blueprints\app\__init__.py", line 7, in <module>
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "PyInstaller\loader\pyimod02_importers.py", line 419, in exec_module
  File "webct\blueprints\app\routes.py", line 9, in <module>
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "PyInstaller\loader\pyimod02_importers.py", line 419, in exec_module
  File "tigre\__init__.py", line 18, in <module>
  File "os.py", line 680, in __getitem__
KeyError: 'CUDA_PATH'

This is an issue since although cudart does exist in the distributed package, unless the user has also installed the CUDA SDK, they are unable to use tigre.

@AnderBiguri
Copy link
Member

@gfardell @WYVERN2742 removed the code suggested, have a test if this fixes the issue. I'll reopen it if it doesn't

@WYVERN2742
Copy link

Thanks for the fast response! I'll test when I get more time later 👍

@WYVERN2742
Copy link

Finally got around to testing, can confirm this works and I'm able to run TIGRE from a conda environment with cudatoolkit, without requiring the host to have CUDA installed! 🎉

@AnderBiguri AnderBiguri reopened this Jan 28, 2025
@AnderBiguri
Copy link
Member

Reopening this. Since I fixed it a month and a half ago, I got 4 different people having the issue that DLLs are not found, so this solution fixes it for people who have CUDA via conda, but the mayority of my users don't. I won't revert the changes yet, but I do need to find a way to cater to both cases.

I have CUDA installed in the host (because I need it for GPU code shenanigans) so its a bit hard to test, but maybe @gfardell you can give me a hand? Is there any OS/conda parameter that we can pull in the code that you mentioned above, so I can add it in the if condition?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants