Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Update the Readme per feedback #1236

Merged
merged 8 commits into from
Sep 19, 2024
305 changes: 149 additions & 156 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,14 @@

> ⚠️ **Cortex.cpp is currently in Development. This documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.**

## About
## Overview
Cortex.cpp is a Local AI engine that is used to run and customize LLMs. Cortex can be deployed as a standalone server, or integrated into apps like [Jan.ai](https://jan.ai/).

Cortex.cpp is a multi-engine that uses `llama.cpp` as the default engine but also supports the following:
- [`llamacpp`](https://github.com/janhq/cortex.llamacpp)
- [`onnx`](https://github.com/janhq/cortex.onnx)
- [`tensorrt-llm`](https://github.com/janhq/cortex.tensorrt-llm)

## Installation
To install Cortex.cpp, download the installer for your operating system from the following options:

<table>
Expand All @@ -43,7 +42,7 @@ To install Cortex.cpp, download the installer for your operating system from the
<td style="text-align:center">
<a href='https://github.com/janhq/cortex.cpp/releases'>
<img src='https://github.com/janhq/docs/blob/main/static/img/windows.png' style="height:14px; width: 14px" />
<b>cortexcpp.exe</b>
<b>Download</b>
</a>
</td>
<td style="text-align:center">
Expand All @@ -61,79 +60,13 @@ To install Cortex.cpp, download the installer for your operating system from the
<td style="text-align:center">
<a href='https://github.com/janhq/cortex.cpp/releases'>
<img src='https://github.com/janhq/docs/blob/main/static/img/linux.png' style="height:14px; width: 14px" />
<b>cortexcpp.deb</b>
<b>Debian Download</b>
</a>
</td>
<td style="text-align:center">
<a href='https://github.com/janhq/cortex.cpp/releases'>
<img src='https://github.com/janhq/docs/blob/main/static/img/linux.png' style="height:14px; width: 14px" />
<b>cortexcpp.AppImage</b>
</a>
</td>
</tr>
<tr style="text-align:center">
<td style="text-align:center"><b>Beta Build</b></td>
<td style="text-align:center">
<a href='https://github.com/janhq/cortex.cpp/releases'>
<img src='https://github.com/janhq/docs/blob/main/static/img/windows.png' style="height:14px; width: 14px" />
<b>cortexcpp.exe</b>
</a>
</td>
<td style="text-align:center">
<a href='https://github.com/janhq/cortex.cpp/releases'>
<img src='https://github.com/janhq/docs/blob/main/static/img/mac.png' style="height:15px; width: 15px" />
<b>Intel</b>
</a>
</td>
<td style="text-align:center">
<a href='https://github.com/janhq/cortex.cpp/releases'>
<img src='https://github.com/janhq/docs/blob/main/static/img/mac.png' style="height:15px; width: 15px" />
<b>M1/M2/M3/M4</b>
</a>
</td>
<td style="text-align:center">
<a href='https://github.com/janhq/cortex.cpp/releases'>
<img src='https://github.com/janhq/docs/blob/main/static/img/linux.png' style="height:14px; width: 14px" />
<b>cortexcpp.deb</b>
</a>
</td>
<td style="text-align:center">
<a href='https://github.com/janhq/cortex.cpp/releases'>
<img src='https://github.com/janhq/docs/blob/main/static/img/linux.png' style="height:14px; width: 14px" />
<b>cortexcpp.AppImage</b>
</a>
</td>
</tr>
<tr style="text-align:center">
<td style="text-align:center"><b>Nightly Build</b></td>
<td style="text-align:center">
<a href='https://github.com/janhq/cortex.cpp/releases'>
<img src='https://github.com/janhq/docs/blob/main/static/img/windows.png' style="height:14px; width: 14px" />
<b>cortexcpp.exe</b>
</a>
</td>
<td style="text-align:center">
<a href='https://github.com/janhq/cortex.cpp/releases'>
<img src='https://github.com/janhq/docs/blob/main/static/img/mac.png' style="height:15px; width: 15px" />
<b>Intel</b>
</a>
</td>
<td style="text-align:center">
<a href='https://github.com/janhq/cortex.cpp/releases'>
<img src='https://github.com/janhq/docs/blob/main/static/img/mac.png' style="height:15px; width: 15px" />
<b>M1/M2/M3/M4</b>
</a>
</td>
<td style="text-align:center">
<a href='https://github.com/janhq/cortex.cpp/releases'>
<img src='https://github.com/janhq/docs/blob/main/static/img/linux.png' style="height:14px; width: 14px" />
<b>cortexcpp.deb</b>
</a>
</td>
<td style="text-align:center">
<a href='https://github.com/janhq/cortex.cpp/releases'>
<img src='https://github.com/janhq/docs/blob/main/static/img/linux.png' style="height:14px; width: 14px" />
<b>cortexcpp.AppImage</b>
<b>Fedora Download</b>
</a>
</td>
</tr>
Expand All @@ -148,95 +81,29 @@ To install Cortex.cpp, download the installer for your operating system from the
- [cortex.py](https://github.com/janhq/cortex-python)

## Quickstart
To run and chat with a model in Cortex.cpp:
### CLI
```bash
# Start the Cortex.cpp server
# 1. Start the Cortex.cpp server (The server is running at localhost:3928)
cortex

# Start a model
# 2. Start a model
cortex run <model_id>:[engine_name]
```
## Built-in Model Library
Cortex.cpp supports a list of models available on [Cortex Hub](https://huggingface.co/cortexso).

Here are example of models that you can use based on each supported engine:
### `llama.cpp`
| Model ID | Variant (Branch) | Model size | CLI command |
|------------------|------------------|-------------------|------------------------------------|
| codestral | 22b-gguf | 22B | `cortex run codestral:22b-gguf` |
| command-r | 35b-gguf | 35B | `cortex run command-r:35b-gguf` |
| gemma | 7b-gguf | 7B | `cortex run gemma:7b-gguf` |
| llama3 | gguf | 8B | `cortex run llama3:gguf` |
| llama3.1 | gguf | 8B | `cortex run llama3.1:gguf` |
| mistral | 7b-gguf | 7B | `cortex run mistral:7b-gguf` |
| mixtral | 7x8b-gguf | 46.7B | `cortex run mixtral:7x8b-gguf` |
| openhermes-2.5 | 7b-gguf | 7B | `cortex run openhermes-2.5:7b-gguf`|
| phi3 | medium-gguf | 14B - 4k ctx len | `cortex run phi3:medium-gguf` |
| phi3 | mini-gguf | 3.82B - 4k ctx len| `cortex run phi3:mini-gguf` |
| qwen2 | 7b-gguf | 7B | `cortex run qwen2:7b-gguf` |
| tinyllama | 1b-gguf | 1.1B | `cortex run tinyllama:1b-gguf` |
### `ONNX`
| Model ID | Variant (Branch) | Model size | CLI command |
|------------------|------------------|-------------------|------------------------------------|
| gemma | 7b-onnx | 7B | `cortex run gemma:7b-onnx` |
| llama3 | onnx | 8B | `cortex run llama3:onnx` |
| mistral | 7b-onnx | 7B | `cortex run mistral:7b-onnx` |
| openhermes-2.5 | 7b-onnx | 7B | `cortex run openhermes-2.5:7b-onnx`|
| phi3 | mini-onnx | 3.82B - 4k ctx len| `cortex run phi3:mini-onnx` |
| phi3 | medium-onnx | 14B - 4k ctx len | `cortex run phi3:medium-onnx` |
### `TensorRT-LLM`
| Model ID | Variant (Branch) | Model size | CLI command |
|------------------|-------------------------------|-------------------|------------------------------------|
| llama3 | 8b-tensorrt-llm-windows-ampere | 8B | `cortex run llama3:8b-tensorrt-llm-windows-ampere` |
| llama3 | 8b-tensorrt-llm-linux-ampere | 8B | `cortex run llama3:8b-tensorrt-llm-linux-ampere` |
| llama3 | 8b-tensorrt-llm-linux-ada | 8B | `cortex run llama3:8b-tensorrt-llm-linux-ada`|
| llama3 | 8b-tensorrt-llm-windows-ada | 8B | `cortex run llama3:8b-tensorrt-llm-windows-ada` |
| mistral | 7b-tensorrt-llm-linux-ampere | 7B | `cortex run mistral:7b-tensorrt-llm-linux-ampere`|
| mistral | 7b-tensorrt-llm-windows-ampere | 7B | `cortex run mistral:7b-tensorrt-llm-windows-ampere` |
| mistral | 7b-tensorrt-llm-linux-ada | 7B | `cortex run mistral:7b-tensorrt-llm-linux-ada`|
| mistral | 7b-tensorrt-llm-windows-ada | 7B | `cortex run mistral:7b-tensorrt-llm-windows-ada` |
| openhermes-2.5 | 7b-tensorrt-llm-windows-ampere | 7B | `cortex run openhermes-2.5:7b-tensorrt-llm-windows-ampere`|
| openhermes-2.5 | 7b-tensorrt-llm-windows-ada | 7B | `cortex run openhermes-2.5:7b-tensorrt-llm-windows-ada`|
| openhermes-2.5 | 7b-tensorrt-llm-linux-ada | 7B | `cortex run openhermes-2.5:7b-tensorrt-llm-linux-ada`|

> **Note**:
> You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 14B models, and 32 GB to run the 32B models.

## Cortex.cpp CLI Commands

| Command Description | Command Example |
|------------------------------------|---------------------------------------------------------------------|
| **Start Cortex.cpp Server** | `cortex` |
| **Chat with a Model** | `cortex chat [options] [model_id] [message]` |
| **Embeddings** | `cortex embeddings [options] [model_id] [message]` |
| **Pull a Model** | `cortex pull <model_id>` |
| **Download and Start a Model** | `cortex run [options] [model_id]:[engine]` |
| **Get Model Details** | `cortex models get <model_id>` |
| **List Models** | `cortex models list [options]` |
| **Delete a Model** | `cortex models delete <model_id>` |
| **Start a Model** | `cortex models start [model_id]` |
| **Stop a Model** | `cortex models stop <model_id>` |
| **Update a Model** | `cortex models update [options] <model_id>` |
| **Get Engine Details** | `cortex engines get <engine_name>` |
| **Install an Engine** | `cortex engines install <engine_name> [options]` |
| **List Engines** | `cortex engines list [options]` |
| **Uninnstall an Engine** | `cortex engines uninstall <engine_name> [options]` |
| **Show Model Information** | `cortex ps` |
| **Update Cortex.cpp** | `cortex update [options]` |

> **Note**
> For a more detailed CLI Reference documentation, please see [here](https://cortex.so/docs/cli).
# 3. Stop a model
cortex stop <model_id>:[engine_name]

## REST API
Cortex.cpp has a REST API that runs at `localhost:3928`.

### Pull a Model
# 4. Stop the Cortex.cpp server
cortex stop
```
### API
1. Start the API server using `cortex` command.
2. **Pull a Model**
```bash
curl --request POST \
--url http://localhost:3928/v1/models/{model_id}/pull
```

### Start a Model
3. **Start a Model**
```bash
curl --request POST \
--url http://localhost:3928/v1/models/{model_id}/start \
Expand All @@ -259,7 +126,7 @@ curl --request POST \
}'
```

### Chat with a Model
4. **Chat with a Model**
```bash
curl http://localhost:3928/v1/chat/completions \
-H "Content-Type: application/json" \
Expand All @@ -284,18 +151,144 @@ curl http://localhost:3928/v1/chat/completions \
}'
```

### Stop a Model
5. **Stop a Model**
```bash
curl --request POST \
--url http://localhost:3928/v1/models/mistral/stop
```
6. Stop the Cortex.cpp server using `cortex stop` command.
> **Note**:
> Our API server is fully compatible with the OpenAI API, making it easy to integrate with any systems or tools that support OpenAI-compatible APIs.

## Built-in Model Library
Cortex.cpp supports various models available on the [Cortex Hub](https://huggingface.co/cortexso). Once downloaded, all model source files will be stored at `C:\Users\<username>\AppData\Local\cortexcpp\models`.

Here are example of models that you can use based on each supported engine:

| Model | llama.cpp<br >`:gguf` | TensorRT<br >`:tensorrt` | ONNXRuntime<br >`:onnx` | Command |
|------------------|-----------------------|------------------------------------------|----------------------------|-------------------------------|
| llama3.1 | ✅ | | ✅ | cortex run llama3.1:gguf |
| llama3 | ✅ | ✅ | ✅ | cortex run llama3 |
| mistral | ✅ | ✅ | ✅ | cortex run mistral |
| qwen2 | ✅ | | | cortex run qwen2:7b-gguf |
| codestral | ✅ | | | cortex run codestral:22b-gguf |
| command-r | ✅ | | | cortex run command-r:35b-gguf |
| gemma | ✅ | | ✅ | cortex run gemma |
| mixtral | ✅ | | | cortex run mixtral:7x8b-gguf |
| openhermes-2.5 | ✅ | ✅ | ✅ | cortex run openhermes-2.5 |
| phi3 (medium) | ✅ | | ✅ | cortex run phi3:medium |
| phi3 (mini) | ✅ | | ✅ | cortex run phi3:mini |
| tinyllama | ✅ | | | cortex run tinyllama:1b-gguf |

> **Note**:
> You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 14B models, and 32 GB to run the 32B models.

> **Note**
> Check our [API documentation](https://cortex.so/api-reference) for a full list of available endpoints.
## Cortex.cpp CLI Commands
For complete details on CLI commands, please refer to our [CLI documentation](https://cortex.so/docs/cli).

## Build from Source
## REST API
Cortex.cpp includes a REST API accessible at `localhost:3928`. For a complete list of endpoints and their usage, visit our [API documentation](https://cortex.so/api-reference).

## Uninstallation
### Windows
1. Navigate to Add or Remove program.
2. Search for Cortex.cpp.
3. Click Uninstall.
4. Delete the Cortex.cpp data folder located in your home folder.
### MacOs
Run the uninstaller script:
```bash
sudo sh cortex-uninstall.sh
freelerobot marked this conversation as resolved.
Show resolved Hide resolved
```
> **Note**:
> The script requires sudo permission.


### Linux
```bash
sudo apt remove cortexcpp
```

## Alternate Installation
We also provide Beta and Nightly version.
<table>
<tr style="text-align:center">
<td style="text-align:center"><b>Version Type</b></td>
<td style="text-align:center"><b>Windows</b></td>
<td colspan="2" style="text-align:center"><b>MacOS</b></td>
<td colspan="2" style="text-align:center"><b>Linux</b></td>
</tr>
<tr style="text-align:center">
<td style="text-align:center"><b>Beta Build</b></td>
<td style="text-align:center">
<a href='https://github.com/janhq/cortex.cpp/releases'>
<img src='https://github.com/janhq/docs/blob/main/static/img/windows.png' style="height:14px; width: 14px" />
<b>cortexcpp.exe</b>
</a>
</td>
<td style="text-align:center">
<a href='https://github.com/janhq/cortex.cpp/releases'>
<img src='https://github.com/janhq/docs/blob/main/static/img/mac.png' style="height:15px; width: 15px" />
<b>Intel</b>
</a>
</td>
<td style="text-align:center">
<a href='https://github.com/janhq/cortex.cpp/releases'>
<img src='https://github.com/janhq/docs/blob/main/static/img/mac.png' style="height:15px; width: 15px" />
<b>M1/M2/M3/M4</b>
</a>
</td>
<td style="text-align:center">
<a href='https://github.com/janhq/cortex.cpp/releases'>
<img src='https://github.com/janhq/docs/blob/main/static/img/linux.png' style="height:14px; width: 14px" />
<b>cortexcpp.deb</b>
</a>
</td>
<td style="text-align:center">
<a href='https://github.com/janhq/cortex.cpp/releases'>
<img src='https://github.com/janhq/docs/blob/main/static/img/linux.png' style="height:14px; width: 14px" />
<b>cortexcpp.AppImage</b>
</a>
</td>
</tr>
<tr style="text-align:center">
<td style="text-align:center"><b>Nightly Build</b></td>
<td style="text-align:center">
<a href='https://github.com/janhq/cortex.cpp/releases'>
<img src='https://github.com/janhq/docs/blob/main/static/img/windows.png' style="height:14px; width: 14px" />
<b>cortexcpp.exe</b>
</a>
</td>
<td style="text-align:center">
<a href='https://github.com/janhq/cortex.cpp/releases'>
<img src='https://github.com/janhq/docs/blob/main/static/img/mac.png' style="height:15px; width: 15px" />
<b>Intel</b>
</a>
</td>
<td style="text-align:center">
<a href='https://github.com/janhq/cortex.cpp/releases'>
<img src='https://github.com/janhq/docs/blob/main/static/img/mac.png' style="height:15px; width: 15px" />
<b>M1/M2/M3/M4</b>
</a>
</td>
<td style="text-align:center">
<a href='https://github.com/janhq/cortex.cpp/releases'>
<img src='https://github.com/janhq/docs/blob/main/static/img/linux.png' style="height:14px; width: 14px" />
<b>cortexcpp.deb</b>
</a>
</td>
<td style="text-align:center">
<a href='https://github.com/janhq/cortex.cpp/releases'>
<img src='https://github.com/janhq/docs/blob/main/static/img/linux.png' style="height:14px; width: 14px" />
<b>cortexcpp.AppImage</b>
</a>
</td>
</tr>
</table>

### Build from Source

#### Windows
1. Clone the Cortex.cpp repository [here](https://github.com/janhq/cortex.cpp).
2. Navigate to the `engine > vcpkg` folder.
3. Configure the vpkg:
Expand All @@ -319,7 +312,7 @@ cmake .. -DBUILD_SHARED_LIBS=OFF -DCMAKE_TOOLCHAIN_FILE=path_to_vcpkg_folder/vcp
# Get the help information
cortex -h
```
### MacOS
#### MacOS
1. Clone the Cortex.cpp repository [here](https://github.com/janhq/cortex.cpp).
2. Navigate to the `engine > vcpkg` folder.
3. Configure the vpkg:
Expand All @@ -344,7 +337,7 @@ make -j4
# Get the help information
cortex -h
```
### Linux
#### Linux
1. Clone the Cortex.cpp repository [here](https://github.com/janhq/cortex.cpp).
2. Navigate to the `engine > vcpkg` folder.
3. Configure the vpkg:
Expand Down