Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Help wanted] Support TensorRT #40

Open
1 task
csukuangfj opened this issue Feb 20, 2023 · 11 comments
Open
1 task

[Help wanted] Support TensorRT #40

csukuangfj opened this issue Feb 20, 2023 · 11 comments
Labels
help wanted Extra attention is needed

Comments

@csukuangfj
Copy link
Collaborator

TODO

  • Support GPU via TensorRT

See https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html

@csukuangfj csukuangfj added the help wanted Extra attention is needed label Feb 20, 2023
@yuekaizhang
Copy link
Contributor

I would like take on this.

  • Support the Onnxruntime CUDA provider.

@manickavela29
Copy link
Contributor

manickavela29 commented Mar 14, 2024

Hi @csukuangfj , @yuekaizhang

Observed that currently only CUDA EP support is there and TensorRT EP support is not there for onnxruntime.
is there ay active developments going on for TensorRT GPU backend?

@csukuangfj
Copy link
Collaborator Author

is there ay active developments going on for TensorRT GPU backend?

We don't have a plan to support it in the near future. Would you like to contribute?

@manickavela29
Copy link
Contributor

I tried adding triggering onnxruntime's tensorrt ep for zipfromer but the model performance was very bad,
debugging further with standalone onnxruntime in python for Encoder models, will update if I see some good results.

@manickavela29
Copy link
Contributor

Hi @csukuangfj,
TensorRT has several parameters, and these will be only valid if TensorRT provider is chosen,
so I need your suggestion on either of below 2.

  1. Putting TRT configs as part of the model-config.cc file model-config.cc
  2. Creating a new config for TRT and exposing the required parameters from it.

Thank you

@csukuangfj
Copy link
Collaborator Author

Could you create a new config for tensorrt and add this config as a member field of OnlineModelConfig and OfflineModelConfig?

You can set the default values of this config as the one used in

std::vector<const char*> option_values = {
"0",
"2147483648",
"10",
"5",
"0",
"0",
"0",
"1",
"1",
"1",
".",
"1",
".", // can be same as the engine cache folder

@manickavela29
Copy link
Contributor

yes, I will send the PR for configs separately in some time.

@manickavela29
Copy link
Contributor

manickavela29 commented Jun 4, 2024

Current perf Trt Vs Cuda

Tensorrt
csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 1.930044 ms
csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.034984 ms
csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.034912 ms
csrc/online-websocket-server-impl.cc:Run:256 Warm up completed : 3 times.
csrc/online-websocket-server.cc:main:79 Started!
csrc/online-websocket-server.cc:main:80 Listening on: 6007
csrc/online-websocket-server.cc:main:81 Number of work threads: 8

Cuda
csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.535651 ms
csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.187492 ms
csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.187698 ms

Apart from this, with Trt there is a huge session creation time.
which is expected, only way to handle is to cache the engine images.

@yuekaizhang
Copy link
Contributor

Current perf Cuda vs Trt

csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 1.930044 ms csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.034984 ms csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.034912 ms csrc/online-websocket-server-impl.cc:Run:256 Warm up completed : 3 times. csrc/online-websocket-server.cc:main:79 Started! csrc/online-websocket-server.cc:main:80 Listening on: 6007 csrc/online-websocket-server.cc:main:81 Number of work threads: 8

csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.535651 ms csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.187492 ms csrc/online-zipformer2-transducer-model.cc:RunEncoder:445 Encoder Duration : 0.187698 ms

Apart from this, with Trt there is a huge session creation time. which is expected, only way to handle is to cache the engine images.

May I know the results for CPU provider if you have? Also, could you explain why there are three lines for each block? e.g. 0.535651 ms 0.187492 ms 0.187698 ms. @manickavela29

@manickavela29
Copy link
Contributor

I can try to get for CPU numbers, but i don't have any high performance CPU,

(in between someone can add support for dnnl ep 🙂)

But here the focus itself is towards GPU with Cuda Vs Trt, is CPU benchmarking relevant?

Code blocks are just performance log which I added for zipformer. Those are not part of the patch

@manickavela29
Copy link
Contributor

manickavela29 commented Jun 12, 2024

Hi @csukuangfj
#992

will create configs for execution provider all together and integrate it with sessions.
let me know if you have any other thoughts
still WIP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants