Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add a model_server example podman-llm #649

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 89 additions & 0 deletions model_servers/podman-llm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# podman-llm

The goal of podman-llm is to make AI even more boring.

## Install

Install podman-llm by running this one-liner:

```
curl -fsSL https://raw.githubusercontent.com/ericcurtin/podman-llm/s/install.sh | sudo bash
```

## Usage

### Running Models

You can run a model using the `run` command. This will start an interactive session where you can query the model.

```
$ podman-llm run granite
> Tell me about podman in less than ten words
A fast, secure, and private container engine for modern applications.
>
```

### Serving Models

To serve a model via HTTP, use the `serve` command. This will start an HTTP server that listens for incoming requests to interact with the model.

```
$ podman-llm serve granite
...
{"tid":"140477699799168","timestamp":1719579518,"level":"INFO","function":"main","line":3793,"msg":"HTTP server listening","n_threads_http":"11","port":"8080","hostname":"127.0.0.1"}
...
```

## Model library

| Model | Parameters | Run |
| ------------------ | ---------- | ------------------------------ |
| granite | 3B | `podman-llm run granite` |
| mistral | 7B | `podman-llm run mistral` |
| merlinite | 7B | `podman-llm run merlinite` |

## Containerfile Example

Here is an example Containerfile:

```
FROM quay.io/podman-llm/podman-llm:41
RUN llama-main --hf-repo ibm-granite/granite-3b-code-instruct-GGUF -m granite-3b-code-instruct.Q4_K_M.gguf
LABEL MODEL=/granite-3b-code-instruct.Q4_K_M.gguf
```

`LABEL MODEL` is important so we know where to find the .gguf file.

And we build via:

```
podman-llm build granite
```

## Diagram

```
+---------------------+ +-----------------------+ +------------------+
| | | Pull runtime layer | | Pull model layer |
| podman-llm run | -> | for llama.cpp | -> | with granite |
| | | (CPU, Vulkan, AMD, | | |
+---------------------+ | Nvidia, Intel, | |------------------|
| Apple Silicon, etc.) | | Repo options: |
+-----------------------+ +------------------+
| |
v v
+--------------+ +---------+
| Hugging Face | | quay.io |
+--------------+ +---------+
\ /
\ /
\ /
v v
+-----------------+
| Start container |
| with llama.cpp |
| and granite |
| model |
+-----------------+
```