Skip to content

Commit

Permalink
added evaluating overview.
Browse files Browse the repository at this point in the history
  • Loading branch information
djl11 committed Jan 15, 2025
1 parent 9495c14 commit 4cbd373
Showing 1 changed file with 25 additions and 0 deletions.
25 changes: 25 additions & 0 deletions interfaces/evaluating/overview.mdx
Original file line number Diff line number Diff line change
@@ -1,3 +1,28 @@
---
title: 'Overview'
---

When building LLM apps, the first question is usually, where do we start...?
Should we just take an off-the-shelf LLM and throw it into production?
Probably not right?

Should we create an evaluation set with 1000s of *hypothetical* failure modes before
putting it in front of any users at all? Also probably not?

In general, the following pseudocode is the best practice to get your LLM app off
the ground 🚀

0. While True:
1.     Update unit tests (evals) 🗂️
2.     while run(tests) failing: 🧪
3.         Vary system prompt, in-context examples, available tools etc. 🔁
4.     Beta test with users, find more failures from production traffic 🚦

So, the first step is to add unit tests.
While it might feel a bit early to be adding unit tests,
how else are you going to express what you want the LLM to actually do?

In a similar mindset to the philosophy of test-driven-development, adding unit tests as
step 1 is a good way to define the *bare-minimum* requirements that we expect our
app to be able to deal with, and then we can also create a *bare-minimum* solution,
and start the data flywheel spinning! 🎡

0 comments on commit 4cbd373

Please sign in to comment.