TensorRT support will be deprecated in the future. Welcome to use the unified model deployment toolbox MMDeploy: https://github.com/open-mmlab/mmdeploy
NVIDIA TensorRT is a software development kit(SDK) for high-performance inference of deep learning models. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. Please check its developer's website for more information.
To ease the deployment of trained models with custom operators from mmcv.ops
using TensorRT, a series of TensorRT plugins are included in MMCV.
ONNX Operator | TensorRT Plugin | MMCV Releases |
---|---|---|
MMCVRoiAlign | MMCVRoiAlign | 1.2.6 |
ScatterND | ScatterND | 1.2.6 |
NonMaxSuppression | NonMaxSuppression | 1.3.0 |
MMCVDeformConv2d | MMCVDeformConv2d | 1.3.0 |
grid_sampler | grid_sampler | 1.3.1 |
cummax | cummax | 1.3.5 |
cummin | cummin | 1.3.5 |
MMCVInstanceNormalization | MMCVInstanceNormalization | 1.3.5 |
MMCVModulatedDeformConv2d | MMCVModulatedDeformConv2d | 1.3.8 |
Notes
- All plugins listed above are developed on TensorRT-7.2.1.6.Ubuntu-16.04.x86_64-gnu.cuda-10.2.cudnn8.0
- Clone repository
git clone https://github.com/open-mmlab/mmcv.git
- Install TensorRT
Download the corresponding TensorRT build from NVIDIA Developer Zone.
For example, for Ubuntu 16.04 on x86-64 with cuda-10.2, the downloaded file is TensorRT-7.2.1.6.Ubuntu-16.04.x86_64-gnu.cuda-10.2.cudnn8.0.tar.gz
.
Then, install as below:
cd ~/Downloads
tar -xvzf TensorRT-7.2.1.6.Ubuntu-16.04.x86_64-gnu.cuda-10.2.cudnn8.0.tar.gz
export TENSORRT_DIR=`pwd`/TensorRT-7.2.1.6
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$TENSORRT_DIR/lib
Install python packages: tensorrt, graphsurgeon, onnx-graphsurgeon
pip install $TENSORRT_DIR/python/tensorrt-7.2.1.6-cp37-none-linux_x86_64.whl
pip install $TENSORRT_DIR/onnx_graphsurgeon/onnx_graphsurgeon-0.2.6-py2.py3-none-any.whl
pip install $TENSORRT_DIR/graphsurgeon/graphsurgeon-0.4.5-py2.py3-none-any.whl
For more detailed information of installing TensorRT using tar, please refer to Nvidia' website.
- Install cuDNN
Install cuDNN 8 following Nvidia' website.
cd mmcv ## to MMCV root directory
MMCV_WITH_OPS=1 MMCV_WITH_TRT=1 pip install -e .
Here is an example.
import torch
import onnx
from mmcv.tensorrt import (TRTWrapper, onnx2trt, save_trt_engine,
is_tensorrt_plugin_loaded)
assert is_tensorrt_plugin_loaded(), 'Requires to complie TensorRT plugins in mmcv'
onnx_file = 'sample.onnx'
trt_file = 'sample.trt'
onnx_model = onnx.load(onnx_file)
## Model input
inputs = torch.rand(1, 3, 224, 224).cuda()
## Model input shape info
opt_shape_dict = {
'input': [list(inputs.shape),
list(inputs.shape),
list(inputs.shape)]
}
## Create TensorRT engine
max_workspace_size = 1 << 30
trt_engine = onnx2trt(
onnx_model,
opt_shape_dict,
max_workspace_size=max_workspace_size)
## Save TensorRT engine
save_trt_engine(trt_engine, trt_file)
## Run inference with TensorRT
trt_model = TRTWrapper(trt_file, ['input'], ['output'])
with torch.no_grad():
trt_outputs = trt_model({'input': inputs})
output = trt_outputs['output']
Below are the main steps:
- Add c++ header file
- Add c++ source file
- Add cuda kernel file
- Register plugin in
trt_plugin.cpp
- Add unit test in
tests/test_ops/test_tensorrt.py
Take RoIAlign plugin roi_align
for example.
-
Add header
trt_roi_align.hpp
to TensorRT include directorymmcv/ops/csrc/tensorrt/
-
Add source
trt_roi_align.cpp
to TensorRT source directorymmcv/ops/csrc/tensorrt/plugins/
-
Add cuda kernel
trt_roi_align_kernel.cu
to TensorRT source directorymmcv/ops/csrc/tensorrt/plugins/
-
Register
roi_align
plugin in trt_plugin.cpp#include "trt_plugin.hpp" #include "trt_roi_align.hpp" REGISTER_TENSORRT_PLUGIN(RoIAlignPluginDynamicCreator); extern "C" { bool initLibMMCVInferPlugins() { return true; } } // extern "C"
-
Add unit test into
tests/test_ops/test_tensorrt.py
Check here for examples.
-
Please note that this feature is experimental and may change in the future. Strongly suggest users always try with the latest master branch.
-
Some of the custom ops in
mmcv
have their cuda implementations, which could be referred.
- None