Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Want Lorax with newer version of TGI #329

Open
yangelaboy opened this issue Mar 14, 2024 · 5 comments
Open

Want Lorax with newer version of TGI #329

yangelaboy opened this issue Mar 14, 2024 · 5 comments
Labels
question Further information is requested

Comments

@yangelaboy
Copy link

yangelaboy commented Mar 14, 2024

Feature request

hello,our models are deploying with TGI(v1.4.3), and we alse want to use lorax. But I find that the tgi version lorax is based on is very different with TGI version v1.4.3。
We are trying to integrate lorax(v0.8) into TGI(v1.4.3)。Is there possible to upgrade TGI of lorax or contribute lorax to TGI?

Motivation

use new features of TGI together with lorax

Your contribution

We are trying to integrate lorax(v0.8) into TGI(v1.4.3), but both lorax and tgi are changing!

@tgaddair
Copy link
Contributor

Hi @yangelaboy, thanks for trying out LoRAX. I'd love to incorporate more upstream work from TGI, but since they changed their license last year, we can no longer pull their code into our repo.

That said, we have implemented many of the same features recently (though in slightly different ways). Are there specific features you're using in TGI you want to see in LoRAX? If so, we can definitely prioritize getting those added.

One thing in TGI we're working to add very soon is speculative decoding. We think our implementation will be particularly interesting, as we'll be able to handle multiple speculation models at once. Let me know if there are other features you're interested in.

@tgaddair tgaddair added the question Further information is requested label Mar 14, 2024
@yangelaboy
Copy link
Author

@tgaddair Thinks for detailed replies. We are using features such as speculative decoding(ngram&medusa), quantization, also we're interested in much optimizations of TGI. We also added functions in TGI like shared prefix prompt cache。
Finally, We want a framework which can support different adapter models and medusa models in same self-trained model with a shared prefix prompt cache.
I will pay attention to Lorax.

@tgaddair
Copy link
Contributor

Hey @yangelaboy, thanks for this context! The good news is all of the things you listed are on our near-term roadmap.

  • Speculative decoding adapters per request - this is what I'm currently working on and hope to have out next week
  • Prefix caching - this is the next major item on the roadmap after speculative decoding, so hopefully a few weeks away at most
  • Quantization - we support a number of quantization options currently, but let me know if there are specifics ones we don't support that you would be interested in.

I'll definitely let you know when the speculative decoding is ready to test out!

@abhibst
Copy link

abhibst commented Mar 18, 2024

Thanks @tgaddair , we are also waiting for the Speculative decoding 👍

@giyaseddin
Copy link

The license is back to Apache-2.0
huggingface/text-generation-inference@ff42d33
@tgaddair

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants