Skip to content

Latest commit



193 lines (133 loc) · 7.42 KB

File metadata and controls

193 lines (133 loc) · 7.42 KB

TensorRT Deployment


TensorRT support will be deprecated in the future. Welcome to use the unified model deployment toolbox MMDeploy:


NVIDIA TensorRT is a software development kit(SDK) for high-performance inference of deep learning models. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. Please check its developer's website for more information. To ease the deployment of trained models with custom operators from mmcv.ops using TensorRT, a series of TensorRT plugins are included in MMCV.

List of TensorRT plugins supported in MMCV

ONNX Operator TensorRT Plugin MMCV Releases
MMCVRoiAlign MMCVRoiAlign 1.2.6
ScatterND ScatterND 1.2.6
NonMaxSuppression NonMaxSuppression 1.3.0
MMCVDeformConv2d MMCVDeformConv2d 1.3.0
grid_sampler grid_sampler 1.3.1
cummax cummax 1.3.5
cummin cummin 1.3.5
MMCVInstanceNormalization MMCVInstanceNormalization 1.3.5
MMCVModulatedDeformConv2d MMCVModulatedDeformConv2d 1.3.8


  • All plugins listed above are developed on TensorRT-

How to build TensorRT plugins in MMCV


  • Clone repository
git clone
  • Install TensorRT

Download the corresponding TensorRT build from NVIDIA Developer Zone.

For example, for Ubuntu 16.04 on x86-64 with cuda-10.2, the downloaded file is TensorRT-

Then, install as below:

cd ~/Downloads
tar -xvzf TensorRT-
export TENSORRT_DIR=`pwd`/TensorRT-

Install python packages: tensorrt, graphsurgeon, onnx-graphsurgeon

pip install $TENSORRT_DIR/python/tensorrt-
pip install $TENSORRT_DIR/onnx_graphsurgeon/onnx_graphsurgeon-0.2.6-py2.py3-none-any.whl
pip install $TENSORRT_DIR/graphsurgeon/graphsurgeon-0.4.5-py2.py3-none-any.whl

For more detailed information of installing TensorRT using tar, please refer to Nvidia' website.

  • Install cuDNN

Install cuDNN 8 following Nvidia' website.

Build on Linux

cd mmcv ## to MMCV root directory
MMCV_WITH_OPS=1 MMCV_WITH_TRT=1 pip install -e .

Create TensorRT engine and run inference in python

Here is an example.

import torch
import onnx

from mmcv.tensorrt import (TRTWrapper, onnx2trt, save_trt_engine,

assert is_tensorrt_plugin_loaded(), 'Requires to complie TensorRT plugins in mmcv'

onnx_file = 'sample.onnx'
trt_file = 'sample.trt'
onnx_model = onnx.load(onnx_file)

## Model input
inputs = torch.rand(1, 3, 224, 224).cuda()
## Model input shape info
opt_shape_dict = {
    'input': [list(inputs.shape),

## Create TensorRT engine
max_workspace_size = 1 << 30
trt_engine = onnx2trt(

## Save TensorRT engine
save_trt_engine(trt_engine, trt_file)

## Run inference with TensorRT
trt_model = TRTWrapper(trt_file, ['input'], ['output'])

with torch.no_grad():
    trt_outputs = trt_model({'input': inputs})
    output = trt_outputs['output']

How to add a TensorRT plugin for custom op in MMCV

Main procedures

Below are the main steps:

  1. Add c++ header file
  2. Add c++ source file
  3. Add cuda kernel file
  4. Register plugin in trt_plugin.cpp
  5. Add unit test in tests/test_ops/

Take RoIAlign plugin roi_align for example.

  1. Add header trt_roi_align.hpp to TensorRT include directory mmcv/ops/csrc/tensorrt/

  2. Add source trt_roi_align.cpp to TensorRT source directory mmcv/ops/csrc/tensorrt/plugins/

  3. Add cuda kernel to TensorRT source directory mmcv/ops/csrc/tensorrt/plugins/

  4. Register roi_align plugin in trt_plugin.cpp

    #include "trt_plugin.hpp"
    #include "trt_roi_align.hpp"
    extern "C" {
    bool initLibMMCVInferPlugins() { return true; }
    }  // extern "C"
  5. Add unit test into tests/test_ops/ Check here for examples.


  • Please note that this feature is experimental and may change in the future. Strongly suggest users always try with the latest master branch.

  • Some of the custom ops in mmcv have their cuda implementations, which could be referred.

Known Issues

  • None
