HowTo - Notes to use pinned host buffers for cuda and tensorrt #88
Labels
documentation
Improvements or additions to documentation
ep: cuda
related to cuda execution provider
ep: tensorrt
I'm using the v2 branch for this, but the below is what is currently needed to get
cudaHostRegister
pinned buffers working.OrtValue
- but can't useValue::from_array
as it clones the data every timePerformance difference
Using 50MB input buffers. PINNED buffer saves 1ms or 1.95%. Avoiding extra copy from
ort::from_array
saves 19ms or 27%. Model is a yolov8m with custom starting layer for debayering and resize. Running on Quadro RTX 4000.Criterion results - pinned vs
ort::from_raw()
vs standardort::from_array()
The text was updated successfully, but these errors were encountered: