Support for Dynamic Batch Size in CUDA Graph Inference with TensorRT #3798

OuyangChao · 2024-04-15T05:41:21Z

I'm currently exploring TensorRT for inference tasks and aiming to optimize performance using CUDA graph. One of the requirements for my application is to support dynamic batch sizes during inference. While TensorRT provides dynamic shape support, I couldn't find sufficient information on how to incorporate this feature into CUDA graph inference.

I would appreciate any guidance, documentation, or examples demonstrating how to implement dynamic batch size support in CUDA graph inference with TensorRT.

Thank you for your assistance!

zerollzeng · 2024-04-18T05:58:17Z

See https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#cuda-graphs

Basically, when changing the input shapes, you have to re-capture the graph, because some internal state is changed.

zerollzeng · 2024-04-18T05:58:56Z

Therefore, the best practice is to use one execution context per captured graph, and to share memory across the contexts with createExecutionContextWithoutDeviceMemory(). will it help?

lix19937 · 2024-05-13T14:36:39Z

share memory across the contexts need mutex /lock ? @zerollzeng

ttyio · 2024-07-02T17:08:58Z

share memory across the contexts need mutex /lock ? @zerollzeng

Yes, User need make sure there is concurrent execution when two context share the same memory. Because some reduction kernel might have race condition when they write to the same memory, the behavior is undefined.

ttyio · 2024-07-02T17:09:19Z

closing since this should already solved, thanks all!

zerollzeng self-assigned this Apr 18, 2024

zerollzeng added the triaged Issue has been triaged by maintainers label Apr 18, 2024

ttyio closed this as completed Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Dynamic Batch Size in CUDA Graph Inference with TensorRT #3798

Support for Dynamic Batch Size in CUDA Graph Inference with TensorRT #3798

OuyangChao commented Apr 15, 2024

zerollzeng commented Apr 18, 2024 •

edited

Loading

zerollzeng commented Apr 18, 2024

lix19937 commented May 13, 2024

ttyio commented Jul 2, 2024

ttyio commented Jul 2, 2024

Support for Dynamic Batch Size in CUDA Graph Inference with TensorRT #3798

Support for Dynamic Batch Size in CUDA Graph Inference with TensorRT #3798

Comments

OuyangChao commented Apr 15, 2024

zerollzeng commented Apr 18, 2024 • edited Loading

zerollzeng commented Apr 18, 2024

lix19937 commented May 13, 2024

ttyio commented Jul 2, 2024

ttyio commented Jul 2, 2024

zerollzeng commented Apr 18, 2024 •

edited

Loading