Skip to content

Commit

Permalink
Update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Josh-XT committed Oct 2, 2023
1 parent feb36fc commit a69d906
Show file tree
Hide file tree
Showing 5 changed files with 26 additions and 60 deletions.
53 changes: 25 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
# llamacpp-server in Docker with OpenAI Style Endpoints

This llamacpp server comes equipped with the OpenAI style endpoints that most software is familiar with. will allow you to start it with a `MODEL_URL` defined in the `.env` file instead of needing to manually go to Hugging Face and download the model on the server.
This llamacpp server comes equipped with the OpenAI style endpoints that most software is familiar with. It will allow you to start it with a `MODEL_URL` defined in the `.env` file instead of needing to manually go to Hugging Face and download the model on the server.

This is the default `.env` file, modify it to your needs:
TheBloke sticks to the same naming convention for his models, so you can just use the model repository name like `TheBloke/Mistral-7B-OpenOrca-GGUF` and it will automatically download the model from Hugging Face. If the model repositories are not in the format he uses, you can use the full URL to the model of the download link like `https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GGUF/resolve/main/mistral.7b.q5_k_s.gguf` and it will download the quantized model from Hugging Face.

## Environment Set Up

Create a `.env` file if one does not exist and modify it to your needs. This is the default `.env` file if cloning the repository, modify it to your needs:

```env
MODEL_URL=TheBloke/Mistral-7B-OpenOrca-GGUF
Expand All @@ -17,51 +21,44 @@ UVICORN_WORKERS=2
LLAMACPP_API_KEY=
```

TheBloke sticks to the same naming convention for his models, so you can just use the model name and it will automatically download the model from Hugging Face. If the model repositories are not in the format he uses, you can use the full URL to the model of the download link.
## CPU Only

## Clone the repository
Run with docker:

```bash
git clone https://github.com/Josh-XT/llamacpp-server
cd llamacpp-server
docker pull joshxt/llamacpp-server:full
docker run -d --name llamacpp-server -p 8091:8091 joshxt/llamacpp-server:full --env-file .env
```

Modify the `.env` file if desired before proceeding.

### NVIDIA GPU

If running without an NVIDIA GPU, you can start the server with:
Or with docker-compose:

```bash
docker-compose -f docker-compose-cuda.yml pull
docker-compose -f docker-compose-cuda.yml up
git clone https://github.com/Josh-XT/llamacpp-server
cd llamacpp-server
docker-compose pull
docker-compose up
```

Or if you only want the OpenAPI Style endpoints exposed:

```bash
docker-compose -f docker-compose-cuda-openai.yml pull
docker-compose -f docker-compose-cuda-openai.yml up
```
## NVIDIA GPU

### CPU Only
If you're using an NVIDIA GPU, you can use the CUDA version of the server.

If you are not running on an NVIDIA GPU, you can start the server with:
Run with docker:

```bash
docker-compose pull
docker-compose up
docker pull joshxt/llamacpp-server:full-cuda
docker run -d --name llamacpp-server -p 8091:8091 --gpus all joshxt/llamacpp-server:full-cuda --env-file .env
```

Or if you only want the OpenAPI Style endpoints exposed:
Or with docker-compose:

```bash
docker-compose -f docker-compose-openai.yml pull
docker-compose -f docker-compose-openai.yml up
git clone https://github.com/Josh-XT/llamacpp-server
cd llamacpp-server
docker-compose -f docker-compose-cuda.yml pull
docker-compose -f docker-compose-cuda.yml up
```

The llamacpp server API is available at `http://localhost:8090` by default. The [documentation for the API is available here.](https://github.com/ggerganov/llama.cpp/tree/master/examples/server#api-endpoints)

## OpenAI Style Endpoint Usage

OpenAI Style endpoints available at `http://localhost:8091/` by default. Documentation can be accessed at that url when the server is running.
Expand Down
18 changes: 0 additions & 18 deletions docker-compose-cuda-openai.yml

This file was deleted.

1 change: 0 additions & 1 deletion docker-compose-cuda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ services:
env_file: .env
restart: unless-stopped
ports:
- "8090:8090"
- "8091:8091"
volumes:
- ./models:/app/models
Expand Down
11 changes: 0 additions & 11 deletions docker-compose-openai.yml

This file was deleted.

3 changes: 1 addition & 2 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ services:
env_file: .env
restart: unless-stopped
ports:
- "8090:8090"
- "8091:8091"
volumes:
- ./models:/app/models
- ./models:/app/models

0 comments on commit a69d906

Please sign in to comment.