Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for GGUF models #866

Merged
merged 14 commits into from
Nov 17, 2023
2 changes: 1 addition & 1 deletion .github/workflows/model-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,4 +45,4 @@ jobs:
- name: Run model health check
working-directory: ./api
run: |
poetry run python -m pytest test/healthcheck_models.py
poetry run python -m pytest -v --color=yes test/healthcheck_models.py
59 changes: 6 additions & 53 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
![License](https://img.shields.io/github/license/serge-chat/serge)
[![Discord](https://img.shields.io/discord/1088427963801948201?label=Discord)](https://discord.gg/62Hc6FEYQH)

Serge is a chat interface crafted with [llama.cpp](https://github.com/ggerganov/llama.cpp) for running Alpaca models. No API keys, entirely self-hosted!
Serge is a chat interface crafted with [llama.cpp](https://github.com/ggerganov/llama.cpp) for running GGUF models. No API keys, entirely self-hosted!

- 🌐 **SvelteKit** frontend
- 💾 **[Redis](https://github.com/redis/redis)** for storing chat history & parameters
Expand Down Expand Up @@ -57,18 +57,10 @@ Instructions for setting up Serge on Kubernetes can be found in the [wiki](https

| Category | Models |
|:-------------:|:-------|
| **Alpaca 🦙** | Alpaca-LoRA-65B, GPT4-Alpaca-LoRA-30B |
| **Chronos 🌑**| Chronos-13B, Chronos-33B, Chronos-Hermes-13B |
| **GPT4All 🌍**| GPT4All-13B |
| **Koala 🐨** | Koala-7B, Koala-13B |
| **LLaMA 🦙** | FinLLaMA-33B, LLaMA-Supercot-30B, LLaMA2 7B, LLaMA2 13B, LLaMA2 70B |
| **Lazarus 💀**| Lazarus-30B |
| **Nous 🧠** | Nous-Hermes-13B |
| **OpenAssistant 🎙️** | OpenAssistant-30B |
| **Orca 🐬** | Orca-Mini-v2-7B, Orca-Mini-v2-13B, OpenOrca-Preview1-13B |
| **Samantha 👩**| Samantha-7B, Samantha-13B, Samantha-33B |
| **Vicuna 🦙** | Stable-Vicuna-13B, Vicuna-CoT-7B, Vicuna-CoT-13B, Vicuna-v1.1-7B, Vicuna-v1.1-13B, VicUnlocked-30B, VicUnlocked-65B |
| **Wizard 🧙** | Wizard-Mega-13B, WizardLM-Uncensored-7B, WizardLM-Uncensored-13B, WizardLM-Uncensored-30B, WizardCoder-Python-13B-V1.0 |
| **CodeLLaMA** | 7B, 13B |
| **LLaMA** | 7B, 13B, 70B |
| **Mistral** | 7B-Instruct, 7B-OpenOrca |
| **Zephyr** | 7B-Alpha, 7B-Beta |

Additional weights can be added to the `serge_weights` volume using `docker cp`:

Expand All @@ -80,45 +72,6 @@ docker cp ./my_weight.bin serge:/usr/src/app/weights/

LLaMA will crash if you don't have enough available memory for the model:

| Model | Max RAM Required |
|-------------|------------------|
| 7B | 4.5GB |
| 7B-q2_K | 5.37GB |
| 7B-q3_K_L | 6.10GB |
| 7B-q4_1 | 6.71GB |
| 7B-q4_K_M | 6.58GB |
| 7B-q5_1 | 7.56GB |
| 7B-q5_K_M | 7.28GB |
| 7B-q6_K | 8.03GB |
| 7B-q8_0 | 9.66GB |
| 13B | 12GB |
| 13B-q2_K | 8.01GB |
| 13B-q3_K_L | 9.43GB |
| 13B-q4_1 | 10.64GB |
| 13B-q4_K_M | 10.37GB |
| 13B-q5_1 | 12.26GB |
| 13B-q5_K_M | 11.73GB |
| 13B-q6_K | 13.18GB |
| 13B-q8_0 | 16.33GB |
| 33B | 20GB |
| 33B-q2_K | 16.21GB |
| 33B-q3_K_L | 19.78GB |
| 33B-q4_1 | 22.83GB |
| 33B-q4_K_M | 22.12GB |
| 33B-q5_1 | 26.90GB |
| 33B-q5_K_M | 25.55GB |
| 33B-q6_K | 29.19GB |
| 33B-q8_0 | 37.06GB |
| 65B | 50GB |
| 65B-q2_K | 29.95GB |
| 65B-q3_K_L | 37.15GB |
| 65B-q4_1 | 43.31GB |
| 65B-q4_K_M | 41.85GB |
| 65B-q5_1 | 51.47GB |
| 65B-q5_K_M | 48.74GB |
| 65B-q6_K | 56.06GB |
| 65B-q8_0 | 71.87GB |

## 💬 Support

Need help? Join our [Discord](https://discord.gg/62Hc6FEYQH)
Expand All @@ -139,4 +92,4 @@ To run Serge in development mode:
```bash
git clone https://github.com/serge-chat/serge.git
docker compose -f docker-compose.dev.yml up -d --build
```
```
3 changes: 2 additions & 1 deletion api/.dockerignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
./weights/*.bin**
./weights/*.bin**
./weights/*.gguf**
Loading