-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama support #352
Comments
We won't integrate any models above 4b for completion at the moment because of the tradeoff between latency, serving cost, and quality. For chat and Q&A purposes, models like llama are more suitable as they have lower latency requirements. This is being tracked in issue #222 |
I guess this would also mean code llama at 7b minimum wouldn't fly either |
In issue #370, we introduced support for the Code Llama feature. However, its performance is currently suboptimal due to the significantly larger model size. |
related: ggerganov/llama.cpp#2768 (comment) Performance on M1 Metal shader seems superior with ggml |
llama has been fully supported since v0.1.0 |
So, will we see support for other models, like llama? Maybe through llama.cpp or langchain, or Ollama. Cause continue.dev has support for Ollama and such so I was wondering if models like Llama 2 is on the roadmap?
The text was updated successfully, but these errors were encountered: