-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enc-dec triton backend support #800
Comments
Hi @shannonphu , yes we're working on it. Right now it's at the stage of adding the C++ runtime. Tentative date for Triton enc-dec support is around mid to late January. Thanks for your patience |
is it also included continuous batching? |
|
@symphonylyh Could share if theres an update on this? |
Hi is there an update for this? |
Hi @shannonphu , @sihanwang41 , @mlmonk , @shixianc , May I use this thread to collect your feedback so we can understand your need and prioritize better. I know @sihanwang41 specifically asked about continuous batching, i.e., inflight batching, but others didn't share the request info. Can you reply by describing if any of (1), (2), (3) would be helpful and can unblock you first: Thanks |
@symphonylyh Thanks for the update! Starting with (3) would unblock our team. May I assume this would also have the classic dynamic batching supported? |
Got it, thanks for the input. |
@symphonylyh (1) and/or (3). I am not super clear on the difference between the Python vs C++ backend. I was using this to build the engine https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/enc_dec/README.md |
We have been able to use Triton with enc_dec models, so I'm not sure what
the difference that and (1) is. We find that the TPS for that
implementation is quite slow are looking for ways to make it faster.
Agree that the end goal is (3).
…On Fri, Feb 23, 2024, 5:31 PM Shannon Phu ***@***.***> wrote:
@symphonylyh <https://github.com/symphonylyh> (1) and/or (3). I am not
super clear on the difference between the Python vs C++ backend. I was
using this to build the engine
https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/enc_dec/README.md
—
Reply to this email directly, view it on GitHub
<#800 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACSI76N7HW6ZZB2ANBZNQBDYVEKFBAVCNFSM6AAAAABBMET6RGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRSGA4DOMJZGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@mlmonk Oh interesting, I was under the impression that we just couldn't serve T5 models on Triton yet because the TRT-LLM backend wasn't ready for it yet. |
@symphonylyh @shannonphu We have been able to use the Flan-T5 with Triton. I believe this is (1). You can reproduce it here. Note that this is much older version of both libraries when Flan-T5 was not officially supported. Like @shixianc mentioned, (3) would unblock us and (4) would the ideal state. It would be great if you could share how far along you are with the (3) release. |
hey @symphonylyh , do you have any updates on the progress? |
@symphonylyh, any progress? |
Hello, @symphonylyh . Is there any progress on any of (1-4) ? |
We would love (1) |
Hi @shannonphu , @sihanwang41 , @mlmonk , @shixianc, @LuckyL00ser , @XiaobingSuper @TeamSeshDeadBoy @mrmuke As part of today's release #1725 , enc-dec C++ runtime has been successfully implemented with inflight batching and paged kv cache. Please have a try following the README C++ runtime section Our roadmap next pretty soon:
|
Thanks for the update! This is excellent news, I'm sure it was a lot of effort to make it happen. |
Hello @symphonylyh, |
@HamzaG737 it's full-fledged now. For (1) Triton backend, you can follow the guide here: https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/encoder_decoder.md. Also, closing this issue as support has been added |
Hi is there any update on when enc-dec models like T5 will get the TRT-LLM Triton backend support? Posting an issue for awareness and just wanted to know if its still being planned. Thanks in advance!
#424 (reply in thread)
The text was updated successfully, but these errors were encountered: