Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Add GPU and CUDA validations #4692

Merged
merged 31 commits into from
Apr 2, 2020

Conversation

galipremsagar
Copy link
Contributor

@galipremsagar galipremsagar commented Mar 25, 2020

Closes #4620, #870, #499

  • Add GPU driver and runtime validations
  • Test on a GPU machine
  • Test on non-GPU machine with libcuda & toolkit installed.

python/cudf/cudf/__init__.py Outdated Show resolved Hide resolved
@galipremsagar
Copy link
Contributor Author

Since we are actually using cupy for the driver & GPU check. If cupy doesn't see a GPU it actually errors out like this:

>>> import cudf
Traceback (most recent call last):
  File "/conda/envs/cudf/lib/python3.7/site-packages/cupy/__init__.py", line 21, in <module>
    from cupy import core  # NOQA
  File "/conda/envs/cudf/lib/python3.7/site-packages/cupy/core/__init__.py", line 1, in <module>
    from cupy.core import core  # NOQA
  File "cupy/core/core.pyx", line 1, in init cupy.core.core
  File "/conda/envs/cudf/lib/python3.7/site-packages/cupy/cuda/__init__.py", line 5, in <module>
    from cupy.cuda import compiler  # NOQA
  File "/conda/envs/cudf/lib/python3.7/site-packages/cupy/cuda/compiler.py", line 14, in <module>
    from cupy.cuda import function
  File "cupy/cuda/texture.pxd", line 6, in init cupy.cuda.function
  File "cupy/cuda/texture.pyx", line 1, in init cupy.cuda.texture
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/conda/envs/cudf/lib/python3.7/site-packages/cudf/__init__.py", line 6, in <module>
    import cupy
  File "/conda/envs/cudf/lib/python3.7/site-packages/cupy/__init__.py", line 42, in <module>
    six.reraise(ImportError, ImportError(msg), exc_info[2])
  File "/conda/envs/cudf/lib/python3.7/site-packages/six.py", line 702, in reraise
    raise value.with_traceback(tb)
  File "/conda/envs/cudf/lib/python3.7/site-packages/cupy/__init__.py", line 21, in <module>
    from cupy import core  # NOQA
  File "/conda/envs/cudf/lib/python3.7/site-packages/cupy/core/__init__.py", line 1, in <module>
    from cupy.core import core  # NOQA
  File "cupy/core/core.pyx", line 1, in init cupy.core.core
  File "/conda/envs/cudf/lib/python3.7/site-packages/cupy/cuda/__init__.py", line 5, in <module>
    from cupy.cuda import compiler  # NOQA
  File "/conda/envs/cudf/lib/python3.7/site-packages/cupy/cuda/compiler.py", line 14, in <module>
    from cupy.cuda import function
  File "cupy/cuda/texture.pxd", line 6, in init cupy.cuda.function
  File "cupy/cuda/texture.pyx", line 1, in init cupy.cuda.texture
ImportError: CuPy is not correctly installed.

If you are using wheel distribution (cupy-cudaXX), make sure that the version of CuPy you installed matches with the version of CUDA on your host.
Also, confirm that only one CuPy package is installed:
  $ pip freeze

If you are building CuPy from source, please check your environment, uninstall CuPy and reinstall it with:
  $ pip install cupy --no-cache-dir -vvvv

Check the Installation Guide for details:
  https://docs-cupy.chainer.org/en/latest/install.html

original error: libcuda.so.1: cannot open shared object file: No such file or directory

@kkraus14 @jrhemstad
Will this be okay for us?

@codecov
Copy link

codecov bot commented Mar 25, 2020

Codecov Report

Merging #4692 into branch-0.14 will decrease coverage by 0.14%.
The diff coverage is 47.05%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.14    #4692      +/-   ##
===============================================
- Coverage         88.3%   88.16%   -0.15%     
===============================================
  Files               49       51       +2     
  Lines             9707     9741      +34     
===============================================
+ Hits              8572     8588      +16     
- Misses            1135     1153      +18
Impacted Files Coverage Δ
python/cudf/cudf/errors.py 0% <0%> (ø)
python/cudf/cudf/utils/gpu_utils.py 53.33% <53.33%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e9d9d79...2e23210. Read the comment docs.

@jrhemstad
Copy link
Contributor

That error occurs if libcuda isn't available, probably if the toolkit isn't installed.

@kkraus14
Copy link
Collaborator

No that isn't okay because libcuda and the toolkit could exist on a machine without GPUs for example.

We should also much more gracefully error if we can't find the proper libraries.

@galipremsagar galipremsagar changed the title [WIP] Add GPU and CUDA validations [REVIEW] Add GPU and CUDA validations Mar 26, 2020
@galipremsagar galipremsagar marked this pull request as ready for review March 26, 2020 20:50
@galipremsagar galipremsagar requested review from a team as code owners March 26, 2020 20:50
@galipremsagar galipremsagar requested a review from aschaffer March 26, 2020 20:50
cpp/src/utilities/device.cu Outdated Show resolved Hide resolved
python/cudf/cudf/utils/gpu_utils.py Outdated Show resolved Hide resolved
python/cudf/cudf/utils/gpu_utils.py Outdated Show resolved Hide resolved
python/cudf/cudf/utils/gpu_utils.py Outdated Show resolved Hide resolved
python/cudf/cudf/utils/gpu_utils.py Outdated Show resolved Hide resolved
python/cudf/cudf/utils/gpu.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/utils/gpu.pxd Outdated Show resolved Hide resolved
@kkraus14 kkraus14 added 2 - In Progress Currently a work in progress Python Affects Python cuDF API. labels Mar 26, 2020
@galipremsagar galipremsagar added 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer and removed 2 - In Progress Currently a work in progress labels Apr 1, 2020
python/cudf/cudf/utils/gpu_utils.py Outdated Show resolved Hide resolved
python/cudf/cudf/utils/gpu_utils.py Outdated Show resolved Hide resolved
python/cudf/cudf/utils/gpu_utils.py Outdated Show resolved Hide resolved
@kkraus14
Copy link
Collaborator

kkraus14 commented Apr 2, 2020

@jrhemstad would you mind reviewing again?

# Copyright (c) 2020, NVIDIA CORPORATION.

cdef extern from "cuda.h" nogil:
cdef enum cudaDeviceAttr:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure would be nice if someone were to make a standard set of Python wrappers for the CUDA runtime APIs!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a reason this is in cudf/_cuda as opposed to cudf/_libxx ;)

@kkraus14
Copy link
Collaborator

kkraus14 commented Apr 2, 2020

@galipremsagar as a follow up to this, we likely need to modify setup.py to ship the .pxd files of _cuda and subdirectories in our package. Can you raise an issue and tackle that in a follow up PR?

@kkraus14 kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer labels Apr 2, 2020
@kkraus14 kkraus14 merged commit 6202910 into rapidsai:branch-0.14 Apr 2, 2020
@galipremsagar
Copy link
Contributor Author

@galipremsagar as a follow up to this, we likely need to modify setup.py to ship the .pxd files of _cuda and subdirectories in our package. Can you raise an issue and tackle that in a follow up PR?

Sure, here is a link to the FEA: #4785

rapids-bot bot pushed a commit that referenced this pull request Jan 26, 2022
This PR resolves #10076. It improves `gpu_utils.py` by removing code for handling CUDA < 11.0, which we no longer support.

This is marked as "breaking" because of minor Python API changes. I changed the name of an error class from `UnSupportedCUDAError` to `UnsupportedCUDAError` and removed an unused error class named `UnSupportedGPUError`. It appears the `UnSupportedGPUError` class was introduced in #4692 but has never been used in the code.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #10113
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DOC] Replace OutOfMemory exception with UnsupportedGPU exception
5 participants