docs: added basic docs for evals

ErikBjare · Aug 23, 2024 · 5a0c47e · 5a0c47e
1 parent e30bd08
commit 5a0c47e
Show file tree

Hide file tree

Showing 2 changed files with 40 additions and 0 deletions.
diff --git a/docs/evals.rst b/docs/evals.rst
@@ -0,0 +1,39 @@
+Evals
+=====
+
+gptme provides LLMs with a wide variety of tools, but how well do models make use of them? Which tasks can they complete, and which ones do they struggle with? How far can they get on their own, without any human intervention?
+
+To answer these questions, we have created a evaluation suite that tests the capabilities of LLMs on a wide variety of tasks.
+
+.. note::
+    The evaluation suite is still under development, but the eval harness is mostly complete.
+
+You can run the simple ``hello`` eval with gpt-4o like this:
+
+.. code-block:: bash
+
+    gptme-eval hello --model openai/gpt-4o
+
+However, we recommend running it in Docker to improve isolation and reproducibility:
+
+.. code-block:: bash
+
+    make build-docker
+    docker run \
+        -e "OPENAI_API_KEY=<your api key>" \
+        -v $(pwd)/eval_results:/app/gptme/eval_results \
+        gptme --timeout 60 $@
+
+
+Example run
+-----------
+
+Here's the output from a run of the eval suite: TODO
+
+
+Other evals
+-----------
+
+We have considered running gptme on other evals, such as SWE-Bench, but have not yet done so.
+
+If you are interested in running gptme on other evals, drop a comment in the issues!
diff --git a/docs/index.rst b/docs/index.rst
@@ -28,6 +28,7 @@ See the `README <https://github.com/ErikBjare/gptme/blob/master/README.md>`_ fil
    tools
    providers
    webui
+   evals
    finetuning
    cli
    api
-Original file line number
+Diff line change
@@ Expand Up @@
        tools
        providers
        webui
+       evals
        finetuning
        cli
        api
@@ Expand Down @@