chore: remove llama.cpp submodule

* update docs
jitsi · Oct 8, 2024 · 91e18ba · 91e18ba
1 parent 6b5a4c7
commit 91e18ba
Show file tree

Hide file tree

Showing 4 changed files with 8 additions and 18 deletions.
diff --git a/.gitmodules b/.gitmodules
@@ -1,3 +0,0 @@
-[submodule "llama.cpp"]
-	path = llama.cpp
-	url = https://github.com/ggerganov/llama.cpp

diff --git a/README.md b/README.md
@@ -4,7 +4,7 @@ Skynet is an API server for AI services wrapping several apps and models.
 
 It is comprised of specialized modules which can be enabled or disabled as needed.
 
-- **Summary and Action Items** with llama.cpp (enabled by default)
+- **Summary and Action Items** with vllm (or llama.cpp)
 - **Live Transcriptions** with Faster Whisper via websockets
 - 🚧 _More to follow_
 
@@ -16,16 +16,10 @@ It is comprised of specialized modules which can be enabled or disabled as neede
 ## Summaries Quickstart
 
 ```bash
-# Init and update submodules if you haven't already. This will add llama.cpp which provides the OpenAI api server
-git submodule update --init
-
-# Download the preferred GGUF llama model
-mkdir "$HOME/models"
-
-wget -q --show-progress "https://huggingface.co/jitsi/Llama-3.1-8B-GGUF/blob/main/Llama-3.1-8B-Instruct-Q8_0.gguf?download=true" -O "$HOME/models/Llama-3.1-8B-Instruct-Q8_0.gguf"
-
-export OPENAI_API_SERVER_PATH="$HOME/skynet/llama.cpp/llama-server"
+# if VLLM cannot be used, use llama.cpp server with a gguf model, otherwise, simply point LLAMA_PATH to your raw model folder
+export LLAMA_CPP_SERVER_PATH="$HOME/llama.cpp/llama-server"
 export LLAMA_PATH="$HOME/models/Llama-3.1-8B-Instruct-Q8_0.gguf"
+
 # disable authorization (for testing)
 export BYPASS_AUTHORIZATION=1
 

diff --git a/docs/summaries_module.md b/docs/summaries_module.md
@@ -1,8 +1,8 @@
 # Skynet Summaries Module
 
-Extracts summaries and action items from a given text. The API wraps the wonderful [ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp). It is split into two sub-modules: `summaries:dispatcher` and `summaries:executor`.
+Extracts summaries and action items from a given text. The service can be deployed to use either vllm or llama.cpp. It is split into two sub-modules: `summaries:dispatcher` and `summaries:executor`.
 
-`summaries:dispatcher` will push jobs and retrieve job results from a Redis queue while `summaries:executor` performs the actual inference. They can both be enabled at the same time or deployed separately.
+`summaries:dispatcher` will do CRUD for jobs with a Redis installation, while `summaries:executor` performs the actual inference. They can both be enabled at the same time or deployed separately.
 
 > All requests to this service will require a standard HTTP Authorization header with a Bearer JWT. Check the [**Authorization page**](auth.md) for detailed information on how to generate JWTs or disable authorization.
 
@@ -19,15 +19,15 @@ Extracts summaries and action items from a given text. The API wraps the wonderf
 
 All of the configuration is done via env vars. Check the [Skynet Environment Variables](env_vars.md) page for a list of values.
 
-## Running
+## Running with Llama.cpp
 
 ```bash
 # Download the preferred GGUF llama model
 mkdir "$HOME/models"
 
 wget -q --show-progress "https://huggingface.co/jitsi/Llama-3.1-8B-GGUF/blob/main/Llama-3.1-8B-Instruct-Q8_0.gguf?download=true" -O "$HOME/models/Llama-3.1-8B-Instruct-Q8_0.gguf"
 
-export OPENAI_API_SERVER_PATH="$HOME/skynet/llama.cpp/llama-server"
+export LLAMA_CPP_SERVER_PATH="$HOME/skynet/llama.cpp/llama-server"
 export LLAMA_PATH="$HOME/models/Llama-3.1-8B-Instruct-Q8_0.gguf"
 # disable authorization (for testing)
 export BYPASS_AUTHORIZATION=1

diff --git a/llama.cpp b/llama.cpp