[Performance] #21654

eduardatmadenn · 2024-08-07T15:10:19Z

Describe the issue

While running a C++ program that is using an ORT model to predict images, using the CUDA EP, I have the following behavior:
While I increase the resolution, the inference time is increasing more or less linearly with the resolution, up to a point where it increases exponentially. I suspect it's an issue where it hits a VRAM limit and it starts to do memory swapping/paging, but I'm not sure how to this works.

I don't provide a reproducible code as it is more of a theoretical question, but I hope someone can provide more information.

Thank you

To reproduce

I don't provide a reproducible code as it is more of a theoretical question, but I hope someone can provide more information.

Urgency

No response

Platform

Windows

OS Version

11

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.15.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

No

github-actions · 2024-09-18T15:00:54Z

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

eduardatmadenn added the performance issues related to performance regressions label Aug 7, 2024

sophies927 added the ep:CUDA issues related to the CUDA execution provider label Aug 15, 2024

github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] #21654

[Performance] #21654

eduardatmadenn commented Aug 7, 2024

github-actions bot commented Sep 18, 2024

[Performance] #21654

[Performance] #21654

Comments

eduardatmadenn commented Aug 7, 2024

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

github-actions bot commented Sep 18, 2024