[Performance] Regarding the additional GPU memory space application in onnxtuntime for the custom CUDA operator deform conv2d. #2394
Unanswered
1193700079
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Describe the issue
The deform_conv function is from the path under mmdeploy
csrc/mmdeploy/backend_ops/tensorrt/deform_conv/trt_deform_conv_kernel.cu
The code is as follows:
trt_deform_conv_kernel.cu
It involves the allocation of GPU memory for workspace, and I'm not very clear about how to use the workspace, so I simply use the byte size of an output tensor for storage.
I feel that this is not efficient enough, because tensorRT has corresponding APIs to apply for workspace memory. I want to know if there is a better way in onnxruntime, and I hope we can discuss it together! Please advise!
To reproduce
This code is written inside the Compute function.
the complete code is as follows:
‘’‘
Urgency
No response
Platform
Windows
OS Version
11
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.15.1
ONNX Runtime API
C++
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 11.8
Model File
No response
Is this a quantized model?
No
Beta Was this translation helpful? Give feedback.
All reactions