-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Add GPU and CUDA validations #4692
Conversation
Co-Authored-By: Jake Hemstad <[email protected]>
Since we are actually using cupy for the driver & GPU check. If cupy doesn't see a GPU it actually errors out like this: >>> import cudf
Traceback (most recent call last):
File "/conda/envs/cudf/lib/python3.7/site-packages/cupy/__init__.py", line 21, in <module>
from cupy import core # NOQA
File "/conda/envs/cudf/lib/python3.7/site-packages/cupy/core/__init__.py", line 1, in <module>
from cupy.core import core # NOQA
File "cupy/core/core.pyx", line 1, in init cupy.core.core
File "/conda/envs/cudf/lib/python3.7/site-packages/cupy/cuda/__init__.py", line 5, in <module>
from cupy.cuda import compiler # NOQA
File "/conda/envs/cudf/lib/python3.7/site-packages/cupy/cuda/compiler.py", line 14, in <module>
from cupy.cuda import function
File "cupy/cuda/texture.pxd", line 6, in init cupy.cuda.function
File "cupy/cuda/texture.pyx", line 1, in init cupy.cuda.texture
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/conda/envs/cudf/lib/python3.7/site-packages/cudf/__init__.py", line 6, in <module>
import cupy
File "/conda/envs/cudf/lib/python3.7/site-packages/cupy/__init__.py", line 42, in <module>
six.reraise(ImportError, ImportError(msg), exc_info[2])
File "/conda/envs/cudf/lib/python3.7/site-packages/six.py", line 702, in reraise
raise value.with_traceback(tb)
File "/conda/envs/cudf/lib/python3.7/site-packages/cupy/__init__.py", line 21, in <module>
from cupy import core # NOQA
File "/conda/envs/cudf/lib/python3.7/site-packages/cupy/core/__init__.py", line 1, in <module>
from cupy.core import core # NOQA
File "cupy/core/core.pyx", line 1, in init cupy.core.core
File "/conda/envs/cudf/lib/python3.7/site-packages/cupy/cuda/__init__.py", line 5, in <module>
from cupy.cuda import compiler # NOQA
File "/conda/envs/cudf/lib/python3.7/site-packages/cupy/cuda/compiler.py", line 14, in <module>
from cupy.cuda import function
File "cupy/cuda/texture.pxd", line 6, in init cupy.cuda.function
File "cupy/cuda/texture.pyx", line 1, in init cupy.cuda.texture
ImportError: CuPy is not correctly installed.
If you are using wheel distribution (cupy-cudaXX), make sure that the version of CuPy you installed matches with the version of CUDA on your host.
Also, confirm that only one CuPy package is installed:
$ pip freeze
If you are building CuPy from source, please check your environment, uninstall CuPy and reinstall it with:
$ pip install cupy --no-cache-dir -vvvv
Check the Installation Guide for details:
https://docs-cupy.chainer.org/en/latest/install.html
original error: libcuda.so.1: cannot open shared object file: No such file or directory @kkraus14 @jrhemstad |
Codecov Report
@@ Coverage Diff @@
## branch-0.14 #4692 +/- ##
===============================================
- Coverage 88.3% 88.16% -0.15%
===============================================
Files 49 51 +2
Lines 9707 9741 +34
===============================================
+ Hits 8572 8588 +16
- Misses 1135 1153 +18
Continue to review full report at Codecov.
|
That error occurs if |
No that isn't okay because We should also much more gracefully error if we can't find the proper libraries. |
Co-Authored-By: Keith Kraus <[email protected]>
Co-Authored-By: Keith Kraus <[email protected]>
@jrhemstad would you mind reviewing again? |
# Copyright (c) 2020, NVIDIA CORPORATION. | ||
|
||
cdef extern from "cuda.h" nogil: | ||
cdef enum cudaDeviceAttr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure would be nice if someone were to make a standard set of Python wrappers for the CUDA runtime APIs!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a reason this is in cudf/_cuda
as opposed to cudf/_libxx
;)
@galipremsagar as a follow up to this, we likely need to modify |
Sure, here is a link to the FEA: #4785 |
This PR resolves #10076. It improves `gpu_utils.py` by removing code for handling CUDA < 11.0, which we no longer support. This is marked as "breaking" because of minor Python API changes. I changed the name of an error class from `UnSupportedCUDAError` to `UnsupportedCUDAError` and removed an unused error class named `UnSupportedGPUError`. It appears the `UnSupportedGPUError` class was introduced in #4692 but has never been used in the code. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Ashwin Srinath (https://github.com/shwina) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #10113
Closes #4620, #870, #499