onnx runtime error 6 when using ORT 1.16.1 with TRT and CUDA EP #18065

Tabrizian · 2023-10-23T20:36:01Z

Describe the issue

When running inferences with Triton's ORT backend with ORT 1.16.1 with CUDA EP and TRT EP, we have ran into the issue below:

inference failed: [StatusCode.INTERNAL] onnx runtime error 6: /workspace/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /workspace/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUBLAS failure 1: invalid argument ; GPU=0 ; hostname=fb56312c2595 ; file=/workspace/onnxruntime/onnxruntime/core/providers/cuda/cuda_stream_handle.cc ; line=57 ; expr=cublasSetStream(cublas_handle_, stream);

Note that the ORT backend works fine when using the ORT 1.16.0.

To reproduce

Compile ORT backend with ORT 1.16.1
Download a ResNet model and enable TRT execution provider by adding the line below to the model configuration:

optimization { execution_accelerators { gpu_execution_accelerator : [ { name : "tensorrt"} ] } }

Run an inference using the image_client and the vulture.jpg image.
Observe the error.

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.16.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA, TensorRT

Execution Provider Library Version

CUDA 12.2

The text was updated successfully, but these errors were encountered:

jywu-msft · 2023-10-23T20:39:51Z

this is a known issue and will be fixed in 1.16.2 which will be released in the next couple weeks.

JulienTheron · 2023-10-23T20:40:18Z

I just went to create a new issue about this.
When using ORT directly, the problem occurs when using a user-created compute stream with TensorRT. Does not happen with 1.16.0.

chilo-ms · 2023-10-23T20:57:09Z

Yes, it's a regression for 1.16.1 and only happens when using user provided cuda stream.
Here is the fix 28c1944 and will be included in 1.16.2

Tabrizian · 2023-10-23T21:14:10Z

Thanks for the quick response. I'll be closing this issue.

github-actions bot added ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider labels Oct 23, 2023

jywu-msft assigned chilo-ms Oct 23, 2023

Tabrizian closed this as completed Oct 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

onnx runtime error 6 when using ORT 1.16.1 with TRT and CUDA EP #18065

onnx runtime error 6 when using ORT 1.16.1 with TRT and CUDA EP #18065

Tabrizian commented Oct 23, 2023 •

edited

Loading

jywu-msft commented Oct 23, 2023 •

edited

Loading

JulienTheron commented Oct 23, 2023

chilo-ms commented Oct 23, 2023 •

edited

Loading

Tabrizian commented Oct 23, 2023

onnx runtime error 6 when using ORT 1.16.1 with TRT and CUDA EP #18065

onnx runtime error 6 when using ORT 1.16.1 with TRT and CUDA EP #18065

Comments

Tabrizian commented Oct 23, 2023 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

jywu-msft commented Oct 23, 2023 • edited Loading

JulienTheron commented Oct 23, 2023

chilo-ms commented Oct 23, 2023 • edited Loading

Tabrizian commented Oct 23, 2023

Tabrizian commented Oct 23, 2023 •

edited

Loading

jywu-msft commented Oct 23, 2023 •

edited

Loading

chilo-ms commented Oct 23, 2023 •

edited

Loading