Failed to utilise CUDA with TRT Engine when running on Jetson AGX Orin (ONNX->TRT, Transformer) #2997

niqbal996 · 2023-05-23T11:26:18Z

Description

I am trying to convert a DINO object detector Transformer trained with a custom dataset model to a TensorRT engine with any precision. I am using trtexec for that. The engine file is generated but it gives the following error:

[05/23/2023-10:55:09] [E] Error[1]: [executionContext.cpp::handleTrainStationRunnerPhase1::146] Error Code 1: Cuda Runtime (operation not permitted when stream is capturing)
[05/23/2023-10:55:09] [W] The CUDA graph capture on the stream has failed.
[05/23/2023-10:55:09] [W] The built TensorRT engine contains operations that are not permitted under CUDA graph capture mode.
[05/23/2023-10:55:09] [W] The specified --useCudaGraph flag has been ignored. The inference will be launched without using CUDA graph launch.

I can perform inference with a python script but I think it is not using the GPU for that and can only run at 3 FPS on Jetson Orin. I am expecting roughly at least 10 FPS on the Jetson Orin.
Any ideas what might be causing this issue and how I can solve that?
I used the following command for conversion:

trtexec --onnx=dino_simp.onnx --int8 --useCudaGraph --verbose --saveEngine=dino_last.trt --workspace=20000

Verbose output logs

Environment

TensorRT Version: 8.5.2

NVIDIA GPU: Jetson AGX ORIN

NVIDIA Driver Version: L4T 35.3.1

CUDA Version: 11.4.315

CUDNN Version: 8.6.0.166

Operating System: Ubuntu 20.04 LTS

Python Version (if applicable): 3.8

PyTorch Version (if applicable):

Container (if so, version): nvcr.io/nvidia/l4t-tensorrt:r8.5.2.2-devel

Relevant Files

Model link:
The model onnx file and the full verbose log output file can be downloaded at the following link: drive

Steps To Reproduce

Commands or scripts:
trtexec --onnx=dino_simp.onnx --int8 --useCudaGraph --verbose --saveEngine=dino_last.trt --workspace=20000

Have you tried the latest release?: The latest TensorRT release I can only try on my laptop but the corresponding Jetpack release is not yet available to be installed on the Jetson Orin. #2949

Can this model run on other frameworks? I can do inference with ONNX runtime on my model. I also tried to convert the same model on my laptop and it works without any issues. I can do inference with about 14 FPS which is expected.

Thank you for looking into this.

zerollzeng · 2023-05-24T14:32:16Z

Any ideas what might be causing this issue and how I can solve that?

It's because your model contains operators that can not be capture by cuda graph, like loops or if-condition operators. see https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#cuda-graphs

To get the better performance, I have a few suggestions: 1. use the latest TRT, which has better optimization and new feature. 2. use --best, it will also enable fp16.

j0987834204 · 2023-06-06T06:05:20Z

@niqbal996 Hello, How could you transfer DINO .pth to onnx format?
Which architecture did you use for mmdetection, d2, etc?

Thanks.

niqbal996 · 2023-06-06T09:37:23Z

Hey @j0987834204,
I use Detrex repo for my model training and added the onnx conversion script from detectron2 here Detrex fork. The script is based on the conversion script from detectron2.

IamShubhamGupto · 2023-08-10T20:04:52Z

@niqbal996
Hey thanks for the script and the guide. Could you tell me some stats of the dino model you're running on the orin? I would like to know what dataset you're trained on, the fps, the MaP / accuracy

Coastchb · 2024-11-30T13:12:42Z

Any ideas what might be causing this issue and how I can solve that?

It's because your model contains operators that can not be capture by cuda graph, like loops or if-condition operators. see https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#cuda-graphs

To get the better performance, I have a few suggestions: 1. use the latest TRT, which has better optimization and new feature. 2. use --best, it will also enable fp16.

@niqbal996 @zerollzeng
I also got this problem. Do you have any idea to convert the model quickly so that all the operations can be captured?

zerollzeng self-assigned this May 24, 2023

zerollzeng added the triaged Issue has been triaged by maintainers label May 24, 2023

zerollzeng mentioned this issue May 25, 2023

Error[1]: [executionContext.cpp::handleTrainStationRunnerPhase1::146] Error Code 1: Cuda Runtime (operation not permitted when stream is capturing) #3011

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to utilise CUDA with TRT Engine when running on Jetson AGX Orin (ONNX->TRT, Transformer) #2997

Failed to utilise CUDA with TRT Engine when running on Jetson AGX Orin (ONNX->TRT, Transformer) #2997

niqbal996 commented May 23, 2023

zerollzeng commented May 24, 2023

j0987834204 commented Jun 6, 2023

niqbal996 commented Jun 6, 2023

IamShubhamGupto commented Aug 10, 2023

Coastchb commented Nov 30, 2024

Failed to utilise CUDA with TRT Engine when running on Jetson AGX Orin (ONNX->TRT, Transformer) #2997

Failed to utilise CUDA with TRT Engine when running on Jetson AGX Orin (ONNX->TRT, Transformer) #2997

Comments

niqbal996 commented May 23, 2023

Description

Environment

Relevant Files

Steps To Reproduce

zerollzeng commented May 24, 2023

j0987834204 commented Jun 6, 2023

niqbal996 commented Jun 6, 2023

IamShubhamGupto commented Aug 10, 2023

Coastchb commented Nov 30, 2024