[Performance] #21654
Labels
ep:CUDA
issues related to the CUDA execution provider
performance
issues related to performance regressions
stale
issues that have not been addressed in a while; categorized by a bot
Describe the issue
While running a C++ program that is using an ORT model to predict images, using the CUDA EP, I have the following behavior:
While I increase the resolution, the inference time is increasing more or less linearly with the resolution, up to a point where it increases exponentially. I suspect it's an issue where it hits a VRAM limit and it starts to do memory swapping/paging, but I'm not sure how to this works.
I don't provide a reproducible code as it is more of a theoretical question, but I hope someone can provide more information.
Thank you
To reproduce
I don't provide a reproducible code as it is more of a theoretical question, but I hope someone can provide more information.
Urgency
No response
Platform
Windows
OS Version
11
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.15.1
ONNX Runtime API
C++
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
No
The text was updated successfully, but these errors were encountered: