This C++ application enables machine learning tasks (e.g. object detection, classification, optical flow ...) using the Nvidia Triton Server. Triton manages multiple framework backends for streamlined model deployment.
- Supported Models
- Build Client Libraries
- Dependencies
- Build and Compile
- Tasks
- Notes
- Deploying Models
- Running Inference
- Docker Support
- Demo
- References
- Feedback
To build the client libraries, refer to the official Triton Inference Server client libraries.
Ensure the following dependencies are installed:
- Nvidia Triton Inference Server:
docker pull nvcr.io/nvidia/tritonserver:24.12-py3
- Triton client libraries: Tested on Release r24.12
- Protobuf and gRPC++: Versions compatible with Triton
- RapidJSON:
apt install rapidjson-dev
- libcurl:
apt install libcurl4-openssl-dev
- OpenCV 4: Tested version: 4.7.0
-
Set the environment variable
TritonClientBuild_DIR
or update theCMakeLists.txt
with the path to your installed Triton client libraries. -
Create a build directory:
mkdir build
- Navigate to the build directory:
cd build
- Run CMake to configure the build:
cmake -DCMAKE_BUILD_TYPE=Release ..
Optional flags:
-DSHOW_FRAME
: Enable to display processed frames after inference-DWRITE_FRAME
: Enable to write processed frames to disk
- Build the application:
cmake --build .
Other tasks are in TODO list.
Ensure the model export versions match those supported by your Triton release. Check Triton releases here.
To deploy models, set up a model repository following the Triton Model Repository schema. The config.pbtxt
file is optional unless you're using the OpenVino backend, implementing an Ensemble pipeline, or passing custom inference parameters.
<model_repository>/
<model_name>/
config.pbtxt
<model_version>/
<model_binary>
To start Triton Server:
docker run --gpus=1 --rm \
-p 8000:8000 -p 8001:8001 -p 8002:8002 \
-v /full/path/to/model_repository:/models \
nvcr.io/nvidia/tritonserver:<xx.yy>-py3 tritonserver \
--model-repository=/models
Omit the --gpus
flag if using the CPU version.
./computer-vision-triton-cpp-client \
--source=/path/to/source.format \
--model_type=<model_type> \
--model=<model_name_folder_on_triton> \
--labelsFile=/path/to/labels/coco.names \
--protocol=<http or grpc> \
--serverAddress=<triton-ip> \
--port=<8000 for http, 8001 for grpc> \
For dynamic input sizes:
--input_sizes="c,h,w"
Check .vscode/launch.json
for additional configuration examples
/path/to/source.format
: Path to the input video or image file, for optical flow you must pass two images as comma separated list<model_type>
: Model type (e.g.,yolov5
,yolov8
,yolo11
,yoloseg
,torchvision-classifier
,tensorflow-classifier
, check below Model Type Parameters)<model_name_folder_on_triton>
: Name of the model folder on the Triton server/path/to/labels/coco.names
: Path to the label file (e.g., COCO labels)<http or grpc>
: Communication protocol (http
orgrpc
)<triton-ip>
: IP address of your Triton server<8000 for http, 8001 for grpc>
: Port number<batch or b >
: Batch size, currently only 1 is supported<input_sizes or -is>
: Input sizes input for dynamic axes. Semi-colon separated list format: CHW;CHW;... (e.g., '3,224,224' for single input or '3,224,224;3,224,224' for two inputs, '3,640,640;2' for rtdetr/dfine models)
To view all available parameters, run:
./computer-vision-triton-cpp-client --help
Model | Model Type Parameter |
---|---|
YOLOv5 | yolov5 |
YOLOv6 | yolov6 |
YOLOv7 | yolov7 |
YOLOv8 | yolov8 |
YOLOv9 | yolov9 |
YOLOv10 | yolov10 |
YOLO11 | yolo11 |
RT-DETR | rtdetr |
RT-DETR Ultralytics | rtdetrul |
D-FINE | dfine |
Torchvision Classifier | torchvision-classifier |
Tensorflow Classifier | tensorflow-classifier |
YOLOv5 Segmentation | yoloseg |
YOLOv8 Segmentation | yoloseg |
YOLO11 Segmentation | yoloseg |
RAFT Optical Flow | raft |
For detailed instructions on installing Docker and the NVIDIA Container Toolkit, refer to the Docker Setup Document.
docker build --rm -t computer-vision-triton-cpp-client .
docker run --rm \
-v /path/to/host/data:/app/data \
computer-vision-triton-cpp-client \
--network host \
--source=<path_to_source_on_container> \
--model_type=<model_type> \
--model=<model_name_folder_on_triton> \
--labelsFile=<path_to_labels_on_container> \
--protocol=<http or grpc> \
--serverAddress=<triton-ip> \
--port=<8000 for http, 8001 for grpc>
Real-time inference test (GPU RTX 3060):
- YOLOv7-tiny exported to ONNX: Demo Video
- YOLO11s exported to onnx: Demo Video
- RAFT Optical Flow Large(exported to traced torchscript): Demo Video
- Triton Inference Server Client Example
- Triton User Guide
- Triton Tutorials
- ONNX Models
- Torchvision Models
- Tensorflow Model Garden
Any feedback is greatly appreciated. If you have any suggestions, bug reports, or questions, don't hesitate to open an issue.