Skip to content

Commit

Permalink
Merge pull request langchain-ai#12 from langchain-ai/wfh/change_to_co…
Browse files Browse the repository at this point in the history
…t_qa

Better recommendations
  • Loading branch information
hinthornw authored Aug 16, 2023
2 parents dc00382 + 09dde3b commit 0f92913
Showing 1 changed file with 2 additions and 3 deletions.
5 changes: 2 additions & 3 deletions docs/evaluation/evaluator-implementations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,10 @@ Many of the LLM-based evaluators return a binary score for a given data point, s

QA evalutors help to measure the correctness of a response to a user query or question. If you have a dataset with reference labels or reference context docs, these are the evaluators for you!

Three QA evaluators you can load are: `"qa"`, `"context_qa"`, and `"cot_qa"`:
Three QA evaluators you can load are: `"context_qa"`, `"qa"`, `"cot_qa"`. Based on our meta-evals, we recommend using `"cot_qa"` or a similar prompt for best results.

- The `"qa"` evaluator ([reference](https://api.python.langchain.com/en/latest/evaluation/langchain.evaluation.qa.eval_chain.QAEvalChain.html#langchain-evaluation-qa-eval-chain-qaevalchain)) instructs an LLMChain to directly grade a response as "correct" or "incorrect" based on the reference answer.
- The `"context_qa"` evaluator ([reference](https://api.python.langchain.com/en/latest/evaluation/langchain.evaluation.qa.eval_chain.ContextQAEvalChain.html#langchain.evaluation.qa.eval_chain.ContextQAEvalChain)) instructs the LLM chain to use reference "context" (provided throught the example outputs) in determining correctness. This is useful if you have a larger corpus of grounding docs but don't have ground truth answers to a query.
- The `"qa"` evaluator ([reference](https://api.python.langchain.com/en/latest/evaluation/langchain.evaluation.qa.eval_chain.QAEvalChain.html#langchain-evaluation-qa-eval-chain-qaevalchain)) instructs an LLMChain to directly grade a response as "correct" or "incorrect" based on the reference answer.
- The `"cot_qa"` evaluator ([reference](https://api.python.langchain.com/en/latest/evaluation/langchain.evaluation.qa.eval_chain.CotQAEvalChain.html#langchain.evaluation.qa.eval_chain.CotQAEvalChain)) is similar to the "context_qa" evaluator, expect it instructs the LLMChain to use chain of thought "reasoning" before determining a final verdict. This tends to lead to responses that better correlate with human labels, for a slightly higher token and runtime cost.

<CodeTabs
Expand Down Expand Up @@ -128,7 +128,6 @@ Your dataset may have ground truth labels or contextual information demonstratin
from langchain.smith import RunEvalConfig, run_on_dataset\n
evaluation_config = RunEvalConfig(
evaluators=[
RunEvalConfig.LabeledCriteria("correctness"),
# You can define an arbitrary criterion as a key: value pair in the criteria dict
RunEvalConfig.LabeledCriteria(
{
Expand Down

0 comments on commit 0f92913

Please sign in to comment.