Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama support #352

Closed
SharanSMenon opened this issue Aug 11, 2023 · 5 comments
Closed

Llama support #352

SharanSMenon opened this issue Aug 11, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@SharanSMenon
Copy link

SharanSMenon commented Aug 11, 2023

So, will we see support for other models, like llama? Maybe through llama.cpp or langchain, or Ollama. Cause continue.dev has support for Ollama and such so I was wondering if models like Llama 2 is on the roadmap?

@SharanSMenon SharanSMenon added the enhancement New feature or request label Aug 11, 2023
@wsxiaoys
Copy link
Member

We won't integrate any models above 4b for completion at the moment because of the tradeoff between latency, serving cost, and quality.

For chat and Q&A purposes, models like llama are more suitable as they have lower latency requirements. This is being tracked in issue #222

@06kellyjac
Copy link

I guess this would also mean code llama at 7b minimum wouldn't fly either

@wsxiaoys
Copy link
Member

In issue #370, we introduced support for the Code Llama feature. However, its performance is currently suboptimal due to the significantly larger model size.

@wsxiaoys
Copy link
Member

related: ggerganov/llama.cpp#2768 (comment)

Performance on M1 Metal shader seems superior with ggml

@wsxiaoys
Copy link
Member

wsxiaoys commented Oct 4, 2023

llama has been fully supported since v0.1.0

@wsxiaoys wsxiaoys closed this as completed Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants