This repository contains code for performing optimized TensorRT inference with a pre-trained
pallet detection model that was trained using synthetic data with NVIDIA Omniverse Replicator.
The model takes as input a monocular RGB image, and outputs the pallet box estimates. The box esimates
are defined for each pallet side face. So a single pallet may have multiple box
estimates.
If you have any questions, please feel free to reach out by opening an issue!
Assumes you've already set up your system with OpenCV, PyTorch and numpy.
Install einops for some utility functions.
pip3 install einops
Install torch2trt. This is used
for the TRTModule
class which simplifies engine inference.
git clone https://github.com/NVIDIA-AI-IOT/torch2trt
cd torch2trt
python3 setup.py develop
Download the pallet model ONNX file.
Model | Notes | Links |
---|---|---|
pallet_model_v1_all | Trained for wood and other pallets (metal, plastic). | onnx |
pallet_model_v1_wood | Trained only for wood pallets. | onnx |
To build the FP16 engine, call the following:
./build_trt_fp16.sh <onnx_path> <engine_output_path>
The INT8 model instructions do not yet include calibration. Please only use this model for throughput profiling. The accuracy is likely to vary from FP32/FP16 models. However, once calibration is included, this may become the recommended option given the improved throughput results.
To build the INT8 engine, call the following:
./build_trt_int8.sh <onnx_path> <engine_output_path>
We hope to provide instructions for using the Deep Learning Accelerator (DLA) on Jetson AGX Orin, and INT8 calibration soon.
To profile the engine with the trtexec
tool, call the following:
./profile_engine.sh <engine_path>
Here are the results for a model inference at 256x256 resolution, profiled on Jetson AGX Orin.
Precision | Throughput (FPS) |
---|---|
FP16 | 465 |
INT8 | 710 |
Notes:
- Called
jetson_clocks
before running - Using MAXN power mode by calling
sudo nvpmodel -m0
- Batch size 1
--useSpinWait
flag enabled to stabilize timings--useCudaGraph
flag enabled to use CUDA graph optimizations. Cuda graph isn't yet used in the predict function.
python3 predict.py <engine_path> <image_path> --output=<output_path>
For more options
python3 predict.py --help
Try modifying the predict.py code to visualize inference on a live camera feed.