C++ framework for real-time object detection, supporting multiple deep learning backends and input sources. Run state-of-the-art object detection models (YOLOv4-11, RT-DETR, D-FINE) on video streams, video files, or images with configurable hardware acceleration.
- Multiple model support (YOLO series from YOLOv4 to YOLO11, RT-DETR, D-FINE)
- Switchable inference backends (OpenCV DNN, ONNX Runtime, TensorRT, Libtorch, OpenVINO, Libtensorflow)
- Real-time video processing with GStreamer integration
- GPU acceleration support
- Docker deployment ready
- Benchmarking tools included
- CMake (≥ 3.15)
- C++17 compiler (GCC ≥ 8.0)
- OpenCV (≥ 4.6)
apt install libopencv-dev
- Google Logging (glog)
apt install libgoogle-glog-dev
The project automatically fetches and builds the following dependencies using CMake's FetchContent:
VideoCapture Library (Only for the App module, not the library)
FetchContent_Declare(
VideoCapture
GIT_REPOSITORY https://github.com/olibartfast/videocapture
GIT_TAG main
)
- Handles video input processing
- Provides unified interface for various video sources
- Optional GStreamer integration
FetchContent_Declare(
InferenceEngines
GIT_REPOSITORY https://github.com/olibartfast/inference-engines
GIT_TAG main
)
- Provides abstraction layer for multiple inference backends
- Supported backends:
- OpenCV DNN Module
- ONNX Runtime (default)
- LibTorch
- TensorRT(10.0.7.23)
- OpenVINO
- LibTensorflow(2.13)
build/_deps
folder.
mkdir build && cd build
cmake -DDEFAULT_BACKEND=<backend> -DBUILD_ONLY_LIB=OFF -DCMAKE_BUILD_TYPE=Release ..
cmake --build .
cmake -DDEFAULT_BACKEND=<backend> -DBUILD_ONLY_LIB=OFF -DUSE_GSTREAMER=ON -DCMAKE_BUILD_TYPE=Release ..
cmake --build .
mkdir build && cd build
cmake -DBUILD_ONLY_LIB=ON -DDEFAULT_BACKEND=<backend> -DCMAKE_BUILD_TYPE=Release ..
cmake --build .
Replace <backend>
with one of the following options:
OPENCV_DNN
ONNX_RUNTIME
LIBTORCH
TENSORRT
OPENVINO
LIBTENSORFLOW
-
Custom Backend Paths
If the required backend package is not installed system-wide, you can manually specify its path:- For Libtorch, modify
LibTorch.cmake
or pass theTorch_DIR
argument. - For ONNX Runtime, modify
ONNXRuntime.cmake
or pass theONNX_RUNTIME_DIR
andORT_VERSION
arguments. - For TensorRT, modify
TensorRT.cmake
or passTENSORRT_DIR
andTRT_VERSION
arguments ⚠️ Note: These CMake files above belong to theInferenceEngines
project and are cloned into thebuild/_deps
folder after the configuration step.- Check your backend version is set correct in file cmake/AddCompileDefinitions.cmake
- For Libtorch, modify
-
Cleaning the Build Folder
When switching between backends or modifying configuration options, always clean thebuild
directory before reconfiguring and compiling:rm -rf build && mkdir build
# App tests
cmake -DENABLE_APP_TESTS=ON ..
# Library tests
cmake -DENABLE_DETECTORS_TESTS=ON ..
./object-detection-inference \
[--help | -h] \
--type=<model_type> \
--source=<input_source> \
--labels=<labels_file> \
--weights=<model_weights> \
[--min_confidence=<threshold>] \
[--batch|-b=<batch_size>] \
[--input_sizes|-is='<input_sizes>'] \
[--use-gpu] \
[--warmup] \
[--benchmark] \
[--iterations=<number>]
-
--type=<model_type>
: Specifies the type of object detection model to use. Possible values includeyolov4
,yolov5
,yolov6
,yolov7
,yolov8
,yolov9
,yolov10
,yolo11
,rtdetr
,rtdetrul
,dfine
. -
--source=<input_source>
: Defines the input source for the object detection. It can be:- A live feed URL, e.g.,
rtsp://cameraip:port/stream
- A path to a video file, e.g.,
path/to/video.format
- A path to an image file, e.g.,
path/to/image.format
- A live feed URL, e.g.,
-
--labels=<path/to/labels/file>
: Specifies the path to the file containing the class labels. This file should list the labels used by the model, with each label on a new line. -
--weights=<path/to/model/weights>
: Defines the path to the file containing the model weights.
-
[--min_confidence=<confidence_value>]
: Sets the minimum confidence threshold for detections. Detections with a confidence score below this value will be discarded. The default value is0.25
. -
[--batch | -b=<batch_size>]
: Specifies the batch size for inference. Default value is1
, inference with batch size bigger than 1 is not currently supported. -
[--input_sizes | -is=<input_sizes>]
: Input sizes for each model input when models have dynamic axes or the backend can't retrieve input layer information (like the OpenCV DNN module). Format:CHW;CHW;...
. For example:'3,224,224'
for a single input'3,224,224;3,224,224'
for two inputs'3,640,640;2'
for RT-DETR/D-FINE models
-
[--use-gpu]
: Activates GPU support for inference. This can significantly speed up the inference process if a compatible GPU is available. Default isfalse
. -
[--warmup]
: Enables GPU warmup. Warming up the GPU before performing actual inference can help achieve more consistent and optimized performance. This parameter is relevant only if the inference is being performed on an image source. Default isfalse
. -
[--benchmark]
: Enables benchmarking mode. In this mode, the application will run multiple iterations of inference to measure and report the average inference time. This is useful for evaluating the performance of the model and the inference setup. This parameter is relevant only if the inference is being performed on an image source. Default isfalse
. -
[--iterations=<number>]
: Specifies the number of iterations for benchmarking. The default value is10
.
./object-detection-inference --help
# YOLOv8 Onnx Runtime image processing
./object-detection-inference \
--type=yolov8 \
--source=image.png \
--weights=models/yolov8s.onnx \
--labels=data/coco.names
# YOLOv8 TensorRT video processing
./object-detection-inference \
--type=yolov8 \
--source=video.mp4 \
--weights=models/yolov8s.engine \
--labels=data/coco.names \
--min_confidence=0.4
# RTSP stream processing using RT-DETR Ultralytics implementation
--type=rtdetrul \
--source="rtsp://camera:554/stream" \
--weights=models/rtdetr-l.onnx \
--labels=data/coco.names \
--use-gpu
Check the .vscode
folder for other examples.
Inside the project, in the Dockerfiles folder, there will be a dockerfile for each inference backend (currently onnxruntime, libtorch, tensorrt, openvino)
# Build for specific backend
docker build --rm -t object-detection-inference:<backend_tag> \
-f docker/Dockerfile.backend .
Replace the wildcards with your desired options and paths:
docker run --rm \
-v<path_host_data_folder>:/app/data \
-v<path_host_weights_folder>:/weights \
-v<path_host_labels_folder>:/labels \
object-detection-inference:<backend_tag> \
--type=<model_type> \
--weights=<weight_according_your_backend> \
--source=/app/data/<image_or_video> \
--labels=/labels/<labels_file>
For GPU support, add --gpus all
to the docker run command.
.
├── app/ # Main application
├── detectors/ # Detection library
├── cmake/ # CMake modules
└── docker/ # Dockerfiles
└── build/_deps/ # Fetched dependencies after CMake configuration
- Supported Models
- Model Export Guide
- Backend-specific export documentation:
- Windows builds not currently supported
- Some model/backend combinations may require specific export configurations
- Open an issue for bug reports or feature requests
- Check existing issues for solutions to common problems