Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA backend for the DNN module #14827

Merged
merged 129 commits into from
Oct 21, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
129 commits
Select commit Hold shift + click to select a range
64716ab
stub cuda4dnn design
YashasSamaga May 31, 2019
20f4f2b
minor fixes for tests and doxygen
YashasSamaga May 31, 2019
d8f49fd
add csl public api directory to module headers
YashasSamaga Jun 16, 2019
b9edc00
add low-level CSL components
YashasSamaga Jun 18, 2019
2f9afc8
add high-level CSL components
YashasSamaga Jun 20, 2019
adad256
integrate csl::Tensor into backbone code
YashasSamaga Jun 21, 2019
8635b5e
switch to CPU iff unsupported; otherwise, fail on error
YashasSamaga Jun 21, 2019
6615b7c
add fully connected layer
YashasSamaga Jun 22, 2019
cd0234f
add softmax layer
YashasSamaga Jun 22, 2019
e3e5cc4
add activation layers
YashasSamaga Jun 23, 2019
eb69bf7
support arbitary rank TensorDescriptor
YashasSamaga Jun 24, 2019
f24ad2c
pass input wrappers to `initCUDA()`
YashasSamaga Jun 24, 2019
a5ae407
add 1d/2d/3d-convolution
YashasSamaga Jun 24, 2019
bb984df
add pooling layer
YashasSamaga Jun 25, 2019
16db28b
reorganize and refactor code
YashasSamaga Jun 25, 2019
883968e
fixes for gcc, clang and doxygen; remove cxx14/17 code
YashasSamaga Jun 25, 2019
99fe393
add blank_layer
YashasSamaga Jun 25, 2019
35a1d8f
add LRN layer
YashasSamaga Jun 25, 2019
84067f0
add rounding modes for pooling layer
YashasSamaga Jun 26, 2019
e203703
split tensor.hpp into tensor.hpp and tensor_ops.hpp
YashasSamaga Jun 26, 2019
4c8d23b
add concat layer
YashasSamaga Jun 26, 2019
2ab9bdd
add scale layer
YashasSamaga Jun 26, 2019
b12e4fc
add batch normalization layer
YashasSamaga Jun 26, 2019
4ae2d35
split math.cu into activations.cu and math.hpp
YashasSamaga Jun 26, 2019
cf34c65
add eltwise layer
YashasSamaga Jun 26, 2019
ed87d45
add flatten layer
YashasSamaga Jun 26, 2019
9261242
add tensor transform api
YashasSamaga Jun 28, 2019
7db9e6e
add asymmetric padding support for convolution layer
YashasSamaga Jun 28, 2019
205c191
fix rebase issues
YashasSamaga Jun 28, 2019
e04e463
add reshape layer
YashasSamaga Jun 28, 2019
f120bd0
add permute layer
YashasSamaga Jun 29, 2019
bf114d7
add padding support for concat layer
YashasSamaga Jun 29, 2019
0ab06a9
refactor and reorganize code
YashasSamaga Jul 1, 2019
5d2d336
add normalize layer
YashasSamaga Jul 2, 2019
1619f0b
optimize bias addition in scale layer
YashasSamaga Jul 2, 2019
ed16c7e
add prior box layer
YashasSamaga Jul 2, 2019
76eaf7b
fix and optimize normalize layer
YashasSamaga Jul 4, 2019
ebf5cfb
add asymmetric padding support for pooling layer
YashasSamaga Jul 4, 2019
6fc4ce0
add event API
YashasSamaga Jul 4, 2019
699867e
improve pooling performance for some padding scenarios
YashasSamaga Jul 4, 2019
4791852
avoid over-allocation of compute resources to kernels
YashasSamaga Jul 4, 2019
170fc3e
improve prior box performance
YashasSamaga Jul 4, 2019
8f664f6
enable layer fusion
YashasSamaga Jul 5, 2019
00557bd
add const layer
YashasSamaga Jul 5, 2019
0f21706
add resize layer
YashasSamaga Jul 5, 2019
c850cb5
add slice layer
YashasSamaga Jul 6, 2019
085e632
add padding layer
YashasSamaga Jul 8, 2019
1dfc409
add deconvolution layer
YashasSamaga Jul 8, 2019
39cc3a7
fix channelwise ReLU initialization
YashasSamaga Jul 8, 2019
fd1acaf
add vector traits
YashasSamaga Jul 8, 2019
ad0e4c6
add vectorized versions of relu, clipped_relu, power
YashasSamaga Jul 8, 2019
9414e0b
add vectorized concat kernels
YashasSamaga Jul 9, 2019
1357a9f
improve concat_with_offsets performance
YashasSamaga Jul 9, 2019
c8eee86
vectorize scale and bias kernels
YashasSamaga Jul 9, 2019
3e78b21
add support for multi-billion element tensors
YashasSamaga Jul 10, 2019
ca95f5c
vectorize prior box kernels
YashasSamaga Jul 10, 2019
b678bff
fix address alignment check
YashasSamaga Jul 10, 2019
986b466
improve bias addition performance of conv/deconv/fc layers
YashasSamaga Jul 11, 2019
b0799b1
restructure code for supporting multiple targets
YashasSamaga Jul 16, 2019
ab0b196
add DNN_TARGET_CUDA_FP64
YashasSamaga Jul 16, 2019
6df05bf
add DNN_TARGET_FP16
YashasSamaga Jul 17, 2019
3957460
improve vectorization
YashasSamaga Jul 18, 2019
052b8e7
add region layer
YashasSamaga Jul 20, 2019
1abec63
improve tensor API, add dynamic ranks
YashasSamaga Jul 22, 2019
977cac2
fix parametric relu activation
YashasSamaga Jul 22, 2019
35a8e89
add squeeze/unsqueeze tensor API
YashasSamaga Jul 22, 2019
b30ab12
add reorg layer
YashasSamaga Jul 22, 2019
bbfb5c3
optimize permute and enable 2d permute
YashasSamaga Jul 23, 2019
b6da715
enable 1d and 2d slice
YashasSamaga Jul 23, 2019
9d52163
add split layer
YashasSamaga Jul 23, 2019
00f55dc
add shuffle channel layer
YashasSamaga Jul 23, 2019
800d2a9
allow tensors of different ranks in reshape primitive
YashasSamaga Jul 23, 2019
badd916
patch SliceOp to allow Crop Layer
YashasSamaga Jul 24, 2019
00a4242
allow extra shape inputs in reshape layer
YashasSamaga Jul 24, 2019
085fd05
use `std::move_backward` instead of `std::move` for insert in resizab…
YashasSamaga Jul 24, 2019
93ca2bc
improve workspace management
YashasSamaga Jul 25, 2019
399c83c
add spatial LRN
YashasSamaga Jul 25, 2019
3ff54f1
add nms (cpu) to region layer
YashasSamaga Jul 26, 2019
052e25f
add max pooling with argmax ( and a fix to limits.hpp)
YashasSamaga Jul 27, 2019
2803b91
add max unpooling layer
YashasSamaga Jul 27, 2019
db733fc
refactoring, fixes and many optimizations
YashasSamaga Aug 4, 2019
7ee7025
drop DNN_TARGET_CUDA_FP64
YashasSamaga Aug 4, 2019
f8249ee
rename DNN_TARGET_CUDA_FP32 to DNN_TARGET_CUDA
YashasSamaga Aug 4, 2019
f501821
update supportBackend to be more rigorous
YashasSamaga Aug 4, 2019
1c52f4e
remove stray include from preventing non-cuda build
YashasSamaga Aug 4, 2019
757245b
include op_cuda.hpp outside condition #if
YashasSamaga Aug 4, 2019
ba49b27
fix gcc errors
YashasSamaga Aug 4, 2019
52d7740
increase max. tensor rank limit to six
YashasSamaga Aug 4, 2019
bde84d9
add Interp layer
YashasSamaga Aug 5, 2019
b02811e
drop custom layers; use BackendNode
YashasSamaga Aug 6, 2019
1ee54e8
vectorize activation kernels
YashasSamaga Aug 6, 2019
3f3d0af
fixes for gcc
YashasSamaga Aug 10, 2019
14a79c5
remove wrong assertion
YashasSamaga Aug 10, 2019
37c7026
fix broken assertion in unpooling primitive
YashasSamaga Aug 11, 2019
afa7c2b
fix build errors in non-CUDA build
YashasSamaga Aug 11, 2019
c44aefd
completely remove workspace from public API
YashasSamaga Aug 11, 2019
db3f4f7
fix permute layer
YashasSamaga Aug 11, 2019
9725413
enable accuracy and perf. tests for DNN_TARGET_CUDA
YashasSamaga Aug 11, 2019
47bbd14
add asynchronous forward
YashasSamaga Aug 12, 2019
780eeaf
vectorize eltwise ops
YashasSamaga Aug 12, 2019
0ffc1fa
vectorize fill kernel
YashasSamaga Aug 14, 2019
f93435f
fixes for gcc
YashasSamaga Aug 14, 2019
6a23810
remove CSL headers from public API
YashasSamaga Aug 18, 2019
d66f72b
remove csl header source group from cmake
YashasSamaga Aug 18, 2019
4ed600c
update min. cudnn version in cmake
YashasSamaga Aug 18, 2019
91da82f
add numerically stable FP32 log1pexp
YashasSamaga Aug 18, 2019
027c0d6
refactor code
YashasSamaga Aug 18, 2019
ec5342d
add FP16 specialization to cudnn based tensor addition
YashasSamaga Aug 18, 2019
66de1f2
vectorize scale1 and bias1 + minor refactoring
YashasSamaga Aug 18, 2019
bd8a84e
fix doxygen build
YashasSamaga Aug 18, 2019
94f7bad
fix invalid alignment assertion
YashasSamaga Aug 18, 2019
c681da6
clear backend wrappers before allocateLayers
YashasSamaga Aug 19, 2019
cdb53f6
ignore memory lock failures
YashasSamaga Aug 19, 2019
8732d4f
do not allocate internal blobs
YashasSamaga Aug 20, 2019
ca81308
integrate NVTX
YashasSamaga Aug 21, 2019
c22d1e8
add numerically stable half precision log1pexp
YashasSamaga Sep 1, 2019
0d780d7
fix indentation, following coding style, improve docs
YashasSamaga Sep 1, 2019
9578fc2
remove accidental modification of IE code
YashasSamaga Sep 1, 2019
8b8f780
Revert "add asynchronous forward"
YashasSamaga Sep 7, 2019
9c75b0b
[cmake] throw error for unsupported CC versions
YashasSamaga Sep 10, 2019
4752d7b
fix rebase issues
YashasSamaga Sep 17, 2019
71829dc
add more docs, refactor code, fix bugs
YashasSamaga Sep 22, 2019
7cf6874
minor refactoring and fixes
YashasSamaga Sep 23, 2019
7fc76a4
resolve warnings/errors from clang
YashasSamaga Sep 25, 2019
2818f1c
remove haveCUDA() checks from supportBackend()
YashasSamaga Sep 28, 2019
a97c6c5
remove NVTX integration
YashasSamaga Oct 4, 2019
886b01c
changes based on review comments
YashasSamaga Oct 20, 2019
4536219
avoid exception when no CUDA device is present
YashasSamaga Oct 20, 2019
5eb7fa5
add color code for CUDA in Net::dump
YashasSamaga Oct 21, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cmake/OpenCVMinDepVersions.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ if(NOT DEFINED MIN_VER_CMAKE)
set(MIN_VER_CMAKE 3.5.1)
endif()
set(MIN_VER_CUDA 6.5)
set(MIN_VER_CUDNN 6)
set(MIN_VER_CUDNN 7.5)
set(MIN_VER_PYTHON2 2.7)
set(MIN_VER_PYTHON3 3.2)
set(MIN_VER_ZLIB 1.2.3)
Expand Down
8 changes: 8 additions & 0 deletions modules/dnn/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,14 @@ endif()

if(OPENCV_DNN_CUDA AND HAVE_CUDA AND HAVE_CUBLAS AND HAVE_CUDNN)
list(APPEND include_dirs ${CUDA_TOOLKIT_INCLUDE} ${CUDNN_INCLUDE_DIRS})
set(CC_LIST ${CUDA_ARCH_BIN})
separate_arguments(CC_LIST)
foreach(cc ${CC_LIST})
if(cc VERSION_LESS 5.3)
message(FATAL_ERROR "CUDA backend for DNN module requires CC 5.3 or higher. Please remove unsupported architectures from CUDA_ARCH_BIN option.")
endif()
endforeach()
unset(CC_LIST)
else()
set(sources_options ${sources_options} EXCLUDE_CUDA)
endif()
Expand Down
37 changes: 28 additions & 9 deletions modules/dnn/include/opencv2/dnn/dnn.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,8 @@ CV__DNN_INLINE_NS_BEGIN
DNN_BACKEND_HALIDE,
DNN_BACKEND_INFERENCE_ENGINE, //!< Intel's Inference Engine computational backend.
DNN_BACKEND_OPENCV,
DNN_BACKEND_VKCOM
DNN_BACKEND_VKCOM,
DNN_BACKEND_CUDA
dkurt marked this conversation as resolved.
Show resolved Hide resolved
};

/**
Expand All @@ -85,7 +86,9 @@ CV__DNN_INLINE_NS_BEGIN
DNN_TARGET_OPENCL_FP16,
DNN_TARGET_MYRIAD,
DNN_TARGET_VULKAN,
DNN_TARGET_FPGA //!< FPGA device with CPU fallbacks using Inference Engine's Heterogeneous plugin.
DNN_TARGET_FPGA, //!< FPGA device with CPU fallbacks using Inference Engine's Heterogeneous plugin.
DNN_TARGET_CUDA,
DNN_TARGET_CUDA_FP16
};

CV_EXPORTS std::vector< std::pair<Backend, Target> > getAvailableBackends();
Expand Down Expand Up @@ -274,6 +277,20 @@ CV__DNN_INLINE_NS_BEGIN
virtual Ptr<BackendNode> initInfEngine(const std::vector<Ptr<BackendWrapper> > &inputs);

virtual Ptr<BackendNode> initVkCom(const std::vector<Ptr<BackendWrapper> > &inputs);

/**
* @brief Returns a CUDA backend node
*
* @param context void pointer to CSLContext object
* @param inputs layer inputs
* @param outputs layer outputs
*/
virtual Ptr<BackendNode> initCUDA(
void *context,
const std::vector<Ptr<BackendWrapper>>& inputs,
const std::vector<Ptr<BackendWrapper>>& outputs
);

/**
* @brief Automatic Halide scheduling based on layer hyper-parameters.
* @param[in] node Backend node with Halide functions.
Expand Down Expand Up @@ -515,13 +532,15 @@ CV__DNN_INLINE_NS_BEGIN
* @see Target
*
* List of supported combinations backend / target:
* | | DNN_BACKEND_OPENCV | DNN_BACKEND_INFERENCE_ENGINE | DNN_BACKEND_HALIDE |
* |------------------------|--------------------|------------------------------|--------------------|
* | DNN_TARGET_CPU | + | + | + |
* | DNN_TARGET_OPENCL | + | + | + |
* | DNN_TARGET_OPENCL_FP16 | + | + | |
* | DNN_TARGET_MYRIAD | | + | |
* | DNN_TARGET_FPGA | | + | |
* | | DNN_BACKEND_OPENCV | DNN_BACKEND_INFERENCE_ENGINE | DNN_BACKEND_HALIDE | DNN_BACKEND_CUDA |
* |------------------------|--------------------|------------------------------|--------------------|-------------------|
* | DNN_TARGET_CPU | + | + | + | |
* | DNN_TARGET_OPENCL | + | + | + | |
* | DNN_TARGET_OPENCL_FP16 | + | + | | |
* | DNN_TARGET_MYRIAD | | + | | |
* | DNN_TARGET_FPGA | | + | | |
* | DNN_TARGET_CUDA | | | | + |
* | DNN_TARGET_CUDA_FP16 | | | | + |
*/
CV_WRAP void setPreferableTarget(int targetId);

Expand Down
4 changes: 2 additions & 2 deletions modules/dnn/perf/perf_convolution3d.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -111,8 +111,8 @@ PERF_TEST_P_(Conv3D, conv3d)
Backend backendId = get<0>(get<1>(GetParam()));
Target targetId = get<1>(get<1>(GetParam()));

if (targetId != DNN_TARGET_CPU)
throw SkipTestException("Only CPU is supported");
if (targetId != DNN_TARGET_CPU && backendId != DNN_BACKEND_CUDA)
throw SkipTestException("Only CPU and CUDA is supported");

int inChannels = inputShape[1];

Expand Down
Loading