Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Add GPU and CUDA validations #4692

Merged
merged 31 commits into from
Apr 2, 2020
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
5961872
add GPU support, runtime & driver checks
galipremsagar Mar 25, 2020
e92c6b5
Update python/cudf/cudf/__init__.py
galipremsagar Mar 25, 2020
91baa1c
Merge remote-tracking branch 'upstream/branch-0.14' into init_check
galipremsagar Mar 25, 2020
9617fb9
change error message to provide driver versions and url to compatibil…
galipremsagar Mar 25, 2020
0709532
modify error text
galipremsagar Mar 25, 2020
a61ab8e
Update python/cudf/cudf/__init__.py
galipremsagar Mar 25, 2020
d1f1717
add cpp apis and cython/python bridge
galipremsagar Mar 26, 2020
44de100
Merge branch 'init_check' of https://github.com/galipremsagar/cudf in…
galipremsagar Mar 26, 2020
bc55a75
Update CHANGELOG.md
galipremsagar Mar 26, 2020
2be6ea3
Merge branch 'branch-0.14' into init_check
galipremsagar Mar 26, 2020
8d1482b
Update python/cudf/cudf/utils/gpu_utils.py
galipremsagar Mar 26, 2020
2e05d96
create a new module _cuda to keep all cuda related apis
galipremsagar Mar 27, 2020
e4c3288
remove cpp file
galipremsagar Mar 27, 2020
fe75c23
Merge remote-tracking branch 'upstream/branch-0.14' into init_check
galipremsagar Mar 31, 2020
7272afc
Apply suggestions from code review
galipremsagar Mar 31, 2020
04826a1
Merge branch 'init_check' of https://github.com/galipremsagar/cudf in…
galipremsagar Mar 31, 2020
0d9c6bb
Merge branch 'branch-0.14' into init_check
galipremsagar Mar 31, 2020
e6add1e
Merge branch 'init_check' of https://github.com/galipremsagar/cudf in…
galipremsagar Mar 31, 2020
7c8cb5b
remove except + for c apis
galipremsagar Mar 31, 2020
6818152
add param types in docs
galipremsagar Apr 1, 2020
6ac3a93
add getDeviceProperties api
galipremsagar Apr 1, 2020
5171b0e
do inline skip of isort
galipremsagar Apr 1, 2020
7d6dcc8
Merge remote-tracking branch 'upstream/branch-0.14' into init_check
galipremsagar Apr 1, 2020
79854f4
Merge remote-tracking branch 'upstream/branch-0.14' into init_check
galipremsagar Apr 1, 2020
0643e71
add error handling
galipremsagar Apr 1, 2020
3a7ab8c
add docs
galipremsagar Apr 1, 2020
012a6de
print the detected cuda runtime version
galipremsagar Apr 1, 2020
6032e64
Apply suggestions from code review
galipremsagar Apr 2, 2020
b0bb113
fetching only the properties required instead of queries all props of…
galipremsagar Apr 2, 2020
8f8fd5c
Merge branch 'init_check' of https://github.com/galipremsagar/cudf in…
galipremsagar Apr 2, 2020
2e23210
Update python/cudf/cudf/_cuda/gpu.pxd
galipremsagar Apr 2, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -546,6 +546,7 @@ add_library(cudf
src/utilities/legacy/error_utils.cpp
src/utilities/nvtx/nvtx_utils.cpp
src/utilities/nvtx/legacy/nvtx_utils.cpp
src/utilities/device.cu
src/copying/copy.cpp
src/copying/scatter.cu
src/copying/shift.cu
Expand Down
60 changes: 60 additions & 0 deletions cpp/include/cudf/utilities/device.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
/*
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
* Copyright (c) 2020, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#pragma once


namespace cudf {
namespace experimental {

/**
* @brief Returns the version number of the current CUDA Runtime instance.
* The version is returned as (1000 major + 10 minor). For example,
* CUDA 9.2 would be represented by 9020.
*
* This function returns -1 if runtime version is NULL.
*
* @return Integer containing the version of current CUDA Runtime.
*/
int get_cuda_runtime_version();
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved


/**
* @brief Returns the number of devices with compute capability greater or
* equal to 2.0 that are available for execution.
*
* This function returns -1 if NULL device pointer is assigned.
*
* @return Integer containing the number of compute-capable devices.
*/
int get_gpu_device_count();


/**
* @brief Returns in the latest version of CUDA supported by the driver.
* The version is returned as (1000 major + 10 minor). For example,
* CUDA 9.2 would be represented by 9020. If no driver is installed,
* then 0 is returned as the driver version.
*
* This function returns -1 if driver version is NULL.
*
* @return Integer containing the latest version of CUDA supported by the driver.
*/

int get_cuda_latest_supported_driver_version();

} // namespace experimental
} // namespace cudf
87 changes: 87 additions & 0 deletions cpp/src/utilities/device.cu
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
/*
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
* Copyright (c) 2020, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#include <cuda_runtime_api.h>
#include <cuda.h>

namespace cudf {
namespace experimental {

/**
* @brief Returns the version number of the current CUDA Runtime instance.
* The version is returned as (1000 major + 10 minor). For example,
* CUDA 9.2 would be represented by 9020.
*
* This function returns -1 if runtime version is NULL.
*
* @return Integer containing the version of current CUDA Runtime.
*/
int get_cuda_runtime_version() {
int runtimeVersion;
cudaError_t status;
status = cudaRuntimeGetVersion(&runtimeVersion);
if (status != cudaSuccess) {
// If there is no GPU / any issues with the run time
// like driver initialization or Insufficient driver.
return -1;
}
return runtimeVersion;
}

/**
* @brief Returns the number of devices with compute capability greater or
* equal to 2.0 that are available for execution.
*
* This function returns -1 if NULL device pointer is assigned.
*
* @return Integer containing the number of compute-capable devices.
*/
int get_gpu_device_count() {
int deviceCount;
cudaError_t status;
status = cudaGetDeviceCount(&deviceCount);
if (status != cudaSuccess) {
// If there is no GPU / any issues with the run time
// like driver initialization or Insufficient driver.
return -1;
}
return deviceCount;
}

/**
* @brief Returns in the latest version of CUDA supported by the driver.
* The version is returned as (1000 major + 10 minor). For example,
* CUDA 9.2 would be represented by 9020. If no driver is installed,
* then 0 is returned as the driver version.
*
* This function returns -1 if driver version is NULL.
*
* @return Integer containing the latest version of CUDA supported by the driver.
*/
int get_cuda_latest_supported_driver_version() {
int driverVersion;
cudaError_t status;
status = cudaDriverGetVersion(&driverVersion);
if (status != cudaSuccess) {
// If there is no GPU / any issues with the run time
// like driver initialization or Insufficient driver.
return -1;
}
return driverVersion;
}

} // namespace experimental
} // namespace cudf
10 changes: 8 additions & 2 deletions python/cudf/cudf/__init__.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
# Copyright (c) 2018-2019, NVIDIA CORPORATION.
""" __init__.py

import cupy
isort:skip_file
"""

import rmm
from cudf.utils.gpu_utils import validate_setup
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved

validate_setup()

import cupy
import rmm
from cudf import core, datasets
from cudf._version import get_versions
from cudf.core import DataFrame, Index, MultiIndex, Series, from_pandas, merge
Expand Down
9 changes: 9 additions & 0 deletions python/cudf/cudf/errors.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Copyright (c) 2020, NVIDIA CORPORATION.


class UnSupportedGPUError(Exception):
pass


class UnSupportedCUDAError(Exception):
pass
9 changes: 9 additions & 0 deletions python/cudf/cudf/utils/gpu.pxd
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Copyright (c) 2020, NVIDIA CORPORATION.
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved


cdef extern from "cudf/utilities/device.hpp" namespace \
"cudf::experimental" nogil:

cdef int get_cuda_runtime_version() except +
cdef int get_gpu_device_count() except +
cdef int get_cuda_latest_supported_driver_version() except +
29 changes: 29 additions & 0 deletions python/cudf/cudf/utils/gpu.pyx
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Copyright (c) 2020, NVIDIA CORPORATION.
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved

from cudf.utils.gpu cimport (
get_cuda_runtime_version as cpp_get_cuda_runtime_version,
get_gpu_device_count as cpp_get_gpu_device_count,
get_cuda_latest_supported_driver_version as
cpp_get_cuda_latest_supported_driver_version
)


def get_cuda_runtime_version():
cdef int c_result
with nogil:
c_result = cpp_get_cuda_runtime_version()
return c_result


def get_gpu_device_count():
cdef int c_result
with nogil:
c_result = cpp_get_gpu_device_count()
return c_result


def get_cuda_latest_supported_driver_version():
cdef int c_result
with nogil:
c_result = cpp_get_cuda_latest_supported_driver_version()
return c_result
74 changes: 74 additions & 0 deletions python/cudf/cudf/utils/gpu_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
def validate_setup():
from .gpu import get_gpu_device_count

gpus_count = get_gpu_device_count()

if gpus_count > 0:
# Cupy throws RunTimeException to get GPU count,
# hence obtaining GPU count by in-house cpp api above
import cupy

# 75 - Indicates to get "cudaDevAttrComputeCapabilityMajor" attribute
# 0 - Get GPU 0
major_version = cupy.cuda.runtime.deviceGetAttribute(75, 0)

if major_version >= 6:
# You have a GPU with NVIDIA Pascal™ architecture or better
# Hardware Generation Compute Capability
# Turing 7.5
# Volta 7.x
# Pascal 6.x
# Maxwell 5.x
# Kepler 3.x
# Fermi 2.x
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
pass
else:
from cudf.errors import UnSupportedGPUError

raise UnSupportedGPUError(
"You will need a GPU with NVIDIA Pascal™ architecture or \
better"
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
)
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved

cuda_runtime_version = cupy.cuda.runtime.runtimeGetVersion()

if cuda_runtime_version > 10000:
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
# CUDA Runtime Version Check: Runtime version is greater than 10000
pass
else:
from cudf.errors import UnSupportedCUDAError

raise UnSupportedCUDAError(
"Please update your CUDA Runtime to 10.0 or above"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warn about what CUDA version was detected as opposed to just saying to update to 10.0 or above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to also print cuda_runtime_version in the error message? Or parse cuda_runtime_version to print the cuda version in error message?
Asking this because cuda_runtime_version needs int to str conversion like..
CUDA 10.0 - cuda_runtime_version: 10000
CUDA 10.10 - cuda_runtime_version: 10010
CUDA 10.20 - cuda_runtime_version: 10020

In the docs I couldn't find the minor version is always going to be a double-digit number, else it would be a straight forward string split.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should parse it from these numbers and present it cleanly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, now the detected CUDA version will be printed.

)

cuda_driver_version = cupy.cuda.runtime.driverGetVersion()

if cuda_driver_version == 0:
from cudf.errors import UnSupportedCUDAError

raise UnSupportedCUDAError("Please install CUDA Driver")
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
elif cuda_driver_version >= cuda_runtime_version:
# CUDA Driver Version Check:
# Driver Runtime version is >= Runtime version
pass
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
else:
from cudf.errors import UnSupportedCUDAError

raise UnSupportedCUDAError(
"Please update your NVIDIA GPU Driver to support CUDA \
Runtime.\n"
"Detected CUDA Runtime version : "
+ str(cuda_runtime_version)
+ "\n"
"Latest version of CUDA \
supported by current NVIDIA GPU Driver : "
+ str(cuda_driver_version)
)

else:
import warnings

warnings.warn(
"You donot have an NVIDIA GPU, please install one and try again"
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
)