docs: add "live" eval result output to docs

ErikBjare · Sep 28, 2024 · 03f8972 · 03f8972
1 parent 9481c91
commit 03f8972
Show file tree

Hide file tree

Showing 3 changed files with 15 additions and 14 deletions.
diff --git a/.gitignore b/.gitignore
@@ -4,7 +4,7 @@
 
 projects
 demos
-eval_results
+eval[_-]results*
 
 # logs
 *.log

diff --git a/Makefile b/Makefile
@@ -60,6 +60,14 @@ docs/.clean: docs/conf.py
 	touch docs/.clean
 
 docs: docs/conf.py docs/*.rst docs/.clean
+	if [ ! -e eval_results ]; then \
+		if [ -e eval-results/eval_results ]; then \
+			ln -s eval-results/eval_results .; \
+		else \
+			git fetch origin eval-results; \
+			git checkout origin/eval-results -- eval_results; \
+		fi \
+	fi
 	poetry run make -C docs html SPHINXOPTS="-W --keep-going"
 
 .PHONY: site

diff --git a/docs/evals.rst b/docs/evals.rst
@@ -28,21 +28,14 @@ However, we recommend running it in Docker to improve isolation and reproducibil
         gptme-eval hello --model openai/gpt-4o
 
 
-Example run
------------
-
-Here's the output from a run of the eval suite:
-
-.. code-block::
+Results
+-------
 
-   $ gptme-eval eval_results/20240917_172916/eval_results.csv
-   === Model Comparison ===
-   Model                                 init-git    init-rust    hello      hello-patch    hello-ask    init-react    prime100
-   ------------------------------------  ----------  -----------  ---------  -------------  -----------  ------------  ----------
-   openai/gpt-4o                         ✅ 7.74s    ✅ 9.62s     ✅ 5.02s   ✅ 5.06s       ✅ 4.69s     ❌ timeout    ✅ 7.48s
-   openai/o1-mini                        ✅ 18.44s   ✅ 21.63s    ✅ 21.20s  ✅ 27.39s      ❌ timeout   ❌ 42.65s     ✅ 17.99s
-   anthropic/claude-3-5-sonnet-20240620  ❌ timeout  ❌ timeout   ✅ 8.77s   ✅ 7.09s       ✅ 8.08s     ❌ timeout    ✅ 11.26s
+Here are the results of the evals we have run so far:
 
+.. command-output:: gptme-eval eval_results/*/eval_results.csv
+   :cwd: ..
+   :shell:
 
 We are working on making the evals more robust, informative, and challenging.
-Original file line number
+Diff line change
@@ Expand Up / @@ -4,7 +4,7 @@ @@
     projects
     demos
-    eval_results
+    eval[_-]results*
     # logs
     *.log
@@ Expand Down @@