added evaluating overview.

unifyai · Jan 15, 2025 · 4cbd373 · 4cbd373
1 parent 9495c14
commit 4cbd373
Showing 1 changed file with 25 additions and 0 deletions.
diff --git a/interfaces/evaluating/overview.mdx b/interfaces/evaluating/overview.mdx
@@ -1,3 +1,28 @@
 ---
 title: 'Overview'
 ---
+
+When building LLM apps, the first question is usually, where do we start...?
+Should we just take an off-the-shelf LLM and throw it into production?
+Probably not right?
+
+Should we create an evaluation set with 1000s of *hypothetical* failure modes before
+putting it in front of any users at all? Also probably not?
+
+In general, the following pseudocode is the best practice to get your LLM app off
+the ground 🚀
+
+0. While True:
+1. &nbsp;&nbsp;&nbsp;&nbsp;Update unit tests (evals) 🗂️
+2. &nbsp;&nbsp;&nbsp;&nbsp;while run(tests) failing: 🧪
+3. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Vary system prompt, in-context examples, available tools etc. 🔁
+4. &nbsp;&nbsp;&nbsp;&nbsp;Beta test with users, find more failures from production traffic 🚦
+
+So, the first step is to add unit tests.
+While it might feel a bit early to be adding unit tests,
+how else are you going to express what you want the LLM to actually do?
+
+In a similar mindset to the philosophy of test-driven-development, adding unit tests as
+step 1 is a good way to define the *bare-minimum* requirements that we expect our
+app to be able to deal with, and then we can also create a *bare-minimum* solution,
+and start the data flywheel spinning! 🎡