Skip to content

Commit

Permalink
feat: major eval revamp, openrouter support, removed --llm in favor…
Browse files Browse the repository at this point in the history
… of `--model <provider>/<model>`
  • Loading branch information
ErikBjare committed Aug 9, 2024
1 parent 746c733 commit 6fa8016
Show file tree
Hide file tree
Showing 13 changed files with 383 additions and 175 deletions.
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -230,9 +230,10 @@ Options:
--name TEXT Name of conversation. Defaults to generating
a random name. Pass 'ask' to be prompted for
a name.
--llm [openai|anthropic|azure|local]
LLM provider to use.
--model TEXT Model to use.
--model TEXT Model to use, e.g. openai/gpt-4-turbo,
anthropic/claude-3-5-sonnet-20240620. If
only provider is given, the default model
for that provider is used.
--stream / --no-stream Stream responses
-v, --verbose Verbose output.
-y, --no-confirm Skips all confirmation prompts.
Expand Down
52 changes: 18 additions & 34 deletions docs/providers.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,16 @@ Providers

We support several LLM providers, including OpenAI, Anthropic, Azure, and any OpenAI-compatible server (e.g. `ollama`, `llama-cpp-python`).

To select a provider and model, run `gptme` with the `--model` flag set to `<provider>/<model>`, for example:

```sh
gptme --model openai/gpt-4o "hello"
gptme --model anthropic "hello" # if model part unspecified, will fall back to the provider default
gptme --model openrouter/meta-llama/llama-3.1-70b-instruct "hello"
```

On first startup, if `--model` is not set, and no API keys are set in the config or environment it will be prompted for. It will then auto-detect the provider, and save the key in the configuration file.

## OpenAI

To use OpenAI, set your API key:
Expand All @@ -11,8 +21,6 @@ To use OpenAI, set your API key:
export OPENAI_API_KEY="your-api-key"
```

If no key is set, it will be prompted for and saved in the configuration file.

## Anthropic

To use Anthropic, set your API key:
Expand All @@ -21,11 +29,17 @@ To use Anthropic, set your API key:
export ANTHROPIC_API_KEY="your-api-key"
```

If no key is set, it will be prompted for and saved in the configuration file.
## OpenRouter

To use OpenRouter, set your API key:

```sh
export OPENROUTER_API_KEY="your-api-key"
```

## Local

There are several ways to run local LLM models in a way that exposes a OpenAI API-compatible server, here we will cover two:
There are several ways to run local LLM models in a way that exposes a OpenAI API-compatible server, here we will cover:

### ollama + litellm

Expand All @@ -39,33 +53,3 @@ ollama serve
litellm --model ollama/mistral
export OPENAI_API_BASE="http://localhost:8000"
```

### llama-cpp-python

Here's how to use `llama-cpp-python`.

You first need to install and run the [llama-cpp-python][llama-cpp-python] server. To ensure you get the most out of your hardware, make sure you build it with [the appropriate hardware acceleration][hwaccel]. For macOS, you can find detailed instructions [here][metal].

```sh
MODEL=~/ML/wizardcoder-python-13b-v1.0.Q4_K_M.gguf
poetry run python -m llama_cpp.server --model $MODEL --n_gpu_layers 1 # Use `--n_gpu_layer 1` if you have a M1/M2 chip
export OPENAI_API_BASE="http://localhost:8000/v1"
```

### Usage

Now, simply run `gptme` with the `--llm` flag set to `local`:

```sh
gptme --llm local "hello"
```

### How well does it work?

I've had mixed results. They are not nearly as good as GPT-4, and often struggles with the tools laid out in the system prompt. However I haven't tested with models larger than 7B/13B.

I'm hoping future models, trained better for tool-use and interactive coding (where outputs are fed back), can remedy this, even at 7B/13B model sizes. Perhaps we can fine-tune a model on (GPT-4) conversation logs to create a purpose-fit model that knows how to use the tools.

[llama-cpp-python]: https://github.com/abetlen/llama-cpp-python
[hwaccel]: https://github.com/abetlen/llama-cpp-python#installation-with-hardware-acceleration
[metal]: https://github.com/abetlen/llama-cpp-python/blob/main/docs/install/macos.md
4 changes: 1 addition & 3 deletions eval/agents.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,7 @@


class Agent:
def __init__(self, llm: str, model: str):
self.llm = llm
def __init__(self, model: str):
self.model = model

@abstractmethod
Expand Down Expand Up @@ -42,7 +41,6 @@ def act(self, files: Files | None, prompt: str):
[Message("user", prompt)],
[prompt_sys],
f"gptme-evals-{store.id}",
llm=self.llm,
model=self.model,
no_confirm=True,
interactive=False,
Expand Down
56 changes: 43 additions & 13 deletions eval/evals.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,48 @@
if TYPE_CHECKING:
from main import ExecTest


def correct_output_hello(ctx):
return ctx.stdout == "Hello, human!\n"


def correct_file_hello(ctx):
return ctx.files["hello.py"].strip() == "print('Hello, human!')"


def check_prime_output(ctx):
return "541" in ctx.stdout.split()


def check_clean_exit(ctx):
return ctx.exit_code == 0


def check_clean_working_tree(ctx):
return "nothing to commit, working tree clean" in ctx.stdout


def check_main_py_exists(ctx):
return "main.py" in ctx.files


def check_commit_exists(ctx):
return "No commits yet" not in ctx.stdout


def check_output_hello_ask(ctx):
return "Hello, Erik!" in ctx.stdout


tests: list["ExecTest"] = [
{
"name": "hello",
"files": {"hello.py": "print('Hello, world!')"},
"run": "python hello.py",
"prompt": "Change the code in hello.py to print 'Hello, human!'",
"expect": {
"correct output": lambda ctx: ctx.stdout == "Hello, human!\n",
"correct file": lambda ctx: ctx.files["hello.py"].strip()
== "print('Hello, human!')",
"correct output": correct_output_hello,
"correct file": correct_file_hello,
},
},
{
Expand All @@ -21,9 +53,8 @@
"run": "python hello.py",
"prompt": "Patch the code in hello.py to print 'Hello, human!'",
"expect": {
"correct output": lambda ctx: ctx.stdout == "Hello, human!\n",
"correct file": lambda ctx: ctx.files["hello.py"].strip()
== "print('Hello, human!')",
"correct output": correct_output_hello,
"correct file": correct_file_hello,
},
},
{
Expand All @@ -33,7 +64,7 @@
# TODO: work around the "don't try to execute it" part by improving gptme such that it just gives EOF to stdin in non-interactive mode
"prompt": "modify hello.py to ask the user for their name and print 'Hello, <name>!'. don't try to execute it",
"expect": {
"correct output": lambda ctx: "Hello, Erik!" in ctx.stdout,
"correct output": check_output_hello_ask,
},
},
{
Expand All @@ -42,7 +73,7 @@
"run": "python prime.py",
"prompt": "write a script prime.py that computes and prints the 100th prime number",
"expect": {
"correct output": lambda ctx: "541" in ctx.stdout.split(),
"correct output": check_prime_output,
},
},
{
Expand All @@ -51,11 +82,10 @@
"run": "git status",
"prompt": "initialize a git repository, write a main.py file, and commit it",
"expect": {
"clean exit": lambda ctx: ctx.exit_code == 0,
"clean working tree": lambda ctx: "nothing to commit, working tree clean"
in ctx.stdout,
"main.py exists": lambda ctx: "main.py" in ctx.files,
"we have a commit": lambda ctx: "No commits yet" not in ctx.stdout,
"clean exit": check_clean_exit,
"clean working tree": check_clean_working_tree,
"main.py exists": check_main_py_exists,
"we have a commit": check_commit_exists,
},
},
# Fails, gets stuck on interactive stuff
Expand Down
Loading

0 comments on commit 6fa8016

Please sign in to comment.