-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Myelin could not work with Cuda Graph #1614
Comments
It's expected, myelin might do synchronization on the inference stream. CUDA graph capturing also will fail if the network contains loops, conditional layers (if-else), etc that need to do synchronization during inference. |
@zerollzeng Thanks for your reply! The model itself should be compatible with CUDA Graph because it can work with the CUDA Graph in pytorch and onnxruntime. So I guess it is caused by myelin. Do you think this issue can be resolved by doing some changes inside myelin? Is myelin open-sourced? If yes, I would like to do more debugging there. |
No, myelin is not open-sourced, I think how tensorrt handle cuda graph is a bit different compares to pytorch/onnxruntime, pytorch will execute the model dynamically, while tensorrt execute an engine statically. but I'm not quite sure so. |
If your model doesn't have a lot of light-weight kernel(kernal launch time is big compare to execution time), you won't get much speed up with cuda graph enabled. |
For the model I tested, cuda graph reduced the latency around 50% for both pytorch and onnxruntime. It was also observed that tensorrt can significantly accelerate this model (around 40%) without cuda graph, so we wanted to try tensorrt + cuda graph, which may get better performance than pytorch/onnxruntime + cuda graph. |
@feihugis , the failure here is a V100 specific problem, do you have Turing or Ampere device to run? we should support run cuda graph for your model. Thanks! |
Thanks @ttyio! I will find a Turing/Ampere device to test it and keep you updated. |
I will close this and please reopen if you still have question, thanks! |
@feihugis |
@Coastchb Sorry that I totally forgot if I solved it or not. My suggestion is to try it on Turing/Ampere device. |
Description
Cuda graph failed if the engine has some layers generated by Myelin. The log info can be found below:
Environment
TensorRT Version: v8003
NVIDIA GPU: v100
NVIDIA Driver Version:
CUDA Version: 11.4
CUDNN Version: 8.2.4
Operating System: ubuntu 20.04.3 LTS
Python Version (if applicable): 3.8
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version): nvcr.io/nvidia/tensorrt:21.10-py3
Relevant Files
Steps To Reproduce
Reproduce the issue:
trtexec --loadEngine=model_optimized_tensorrt.trt --verbose --useCudaGraph
Generate the engine file from the onnx model file:
trtexec --onnx=model_optimized_tensorrt_debug.onnx --verbose --useCudaGraph --saveEngine=model_optimized_tensorrt.trt --refit
The text was updated successfully, but these errors were encountered: