Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

onnx runtime error 6 when using ORT 1.16.1 with TRT and CUDA EP #18065

Closed
Tabrizian opened this issue Oct 23, 2023 · 4 comments
Closed

onnx runtime error 6 when using ORT 1.16.1 with TRT and CUDA EP #18065

Tabrizian opened this issue Oct 23, 2023 · 4 comments
Assignees
Labels
ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider

Comments

@Tabrizian
Copy link

Tabrizian commented Oct 23, 2023

Describe the issue

When running inferences with Triton's ORT backend with ORT 1.16.1 with CUDA EP and TRT EP, we have ran into the issue below:

inference failed: [StatusCode.INTERNAL] onnx runtime error 6: /workspace/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /workspace/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUBLAS failure 1: invalid argument ; GPU=0 ; hostname=fb56312c2595 ; file=/workspace/onnxruntime/onnxruntime/core/providers/cuda/cuda_stream_handle.cc ; line=57 ; expr=cublasSetStream(cublas_handle_, stream);

Note that the ORT backend works fine when using the ORT 1.16.0.

To reproduce

  1. Compile ORT backend with ORT 1.16.1
  2. Download a ResNet model and enable TRT execution provider by adding the line below to the model configuration:
optimization { execution_accelerators { gpu_execution_accelerator : [ { name : "tensorrt"} ] } }
  1. Run an inference using the image_client and the vulture.jpg image.

  2. Observe the error.

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.16.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA, TensorRT

Execution Provider Library Version

CUDA 12.2

@github-actions github-actions bot added ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider labels Oct 23, 2023
@jywu-msft
Copy link
Member

jywu-msft commented Oct 23, 2023

this is a known issue and will be fixed in 1.16.2 which will be released in the next couple weeks.

@JulienTheron
Copy link

I just went to create a new issue about this.
When using ORT directly, the problem occurs when using a user-created compute stream with TensorRT. Does not happen with 1.16.0.

@chilo-ms
Copy link
Contributor

chilo-ms commented Oct 23, 2023

Yes, it's a regression for 1.16.1 and only happens when using user provided cuda stream.
Here is the fix 28c1944 and will be included in 1.16.2

@Tabrizian
Copy link
Author

Thanks for the quick response. I'll be closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider
Projects
None yet
Development

No branches or pull requests

4 participants