Want Lorax with newer version of TGI #329

yangelaboy · 2024-03-14T07:59:57Z

Feature request

hello，our models are deploying with TGI(v1.4.3), and we alse want to use lorax. But I find that the tgi version lorax is based on is very different with TGI version v1.4.3。
We are trying to integrate lorax(v0.8) into TGI(v1.4.3)。Is there possible to upgrade TGI of lorax or contribute lorax to TGI？

Motivation

use new features of TGI together with lorax

Your contribution

We are trying to integrate lorax(v0.8) into TGI(v1.4.3)， but both lorax and tgi are changing!

tgaddair · 2024-03-14T15:53:48Z

Hi @yangelaboy, thanks for trying out LoRAX. I'd love to incorporate more upstream work from TGI, but since they changed their license last year, we can no longer pull their code into our repo.

That said, we have implemented many of the same features recently (though in slightly different ways). Are there specific features you're using in TGI you want to see in LoRAX? If so, we can definitely prioritize getting those added.

One thing in TGI we're working to add very soon is speculative decoding. We think our implementation will be particularly interesting, as we'll be able to handle multiple speculation models at once. Let me know if there are other features you're interested in.

yangelaboy · 2024-03-15T09:31:22Z

@tgaddair Thinks for detailed replies. We are using features such as speculative decoding(ngram&medusa), quantization, also we're interested in much optimizations of TGI. We also added functions in TGI like shared prefix prompt cache。
Finally, We want a framework which can support different adapter models and medusa models in same self-trained model with a shared prefix prompt cache.
I will pay attention to Lorax.

tgaddair · 2024-03-15T16:25:29Z

Hey @yangelaboy, thanks for this context! The good news is all of the things you listed are on our near-term roadmap.

Speculative decoding adapters per request - this is what I'm currently working on and hope to have out next week
Prefix caching - this is the next major item on the roadmap after speculative decoding, so hopefully a few weeks away at most
Quantization - we support a number of quantization options currently, but let me know if there are specifics ones we don't support that you would be interested in.

I'll definitely let you know when the speculative decoding is ready to test out!

abhibst · 2024-03-18T14:58:44Z

Thanks @tgaddair , we are also waiting for the Speculative decoding 👍

giyaseddin · 2024-04-09T14:43:02Z

The license is back to Apache-2.0
huggingface/text-generation-inference@ff42d33
@tgaddair

tgaddair added the question Further information is requested label Mar 14, 2024

This was referenced Apr 2, 2024

Added support for Medusa speculative decoding adapters #372

Merged

Adds prompt lookup decoding (ngram speculation) #375

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Want Lorax with newer version of TGI #329

Want Lorax with newer version of TGI #329

yangelaboy commented Mar 14, 2024 •

edited

Loading

tgaddair commented Mar 14, 2024

yangelaboy commented Mar 15, 2024

tgaddair commented Mar 15, 2024

abhibst commented Mar 18, 2024

giyaseddin commented Apr 9, 2024

Want Lorax with newer version of TGI #329

Want Lorax with newer version of TGI #329

Comments

yangelaboy commented Mar 14, 2024 • edited Loading

Feature request

Motivation

Your contribution

tgaddair commented Mar 14, 2024

yangelaboy commented Mar 15, 2024

tgaddair commented Mar 15, 2024

abhibst commented Mar 18, 2024

giyaseddin commented Apr 9, 2024

yangelaboy commented Mar 14, 2024 •

edited

Loading