From b563b38fdd71952cb9181f9b7577ad605c5da301 Mon Sep 17 00:00:00 2001 From: "chen, suyue" Date: Fri, 31 May 2024 21:52:09 +0800 Subject: [PATCH] update install method (#24) --- README.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 23838d92..b880ab10 100644 --- a/README.md +++ b/README.md @@ -2,11 +2,21 @@ Evaluation, benchmark, and scorecard, targeting for performance on throughput and latency, accuracy on popular evaluation harness, safety, and hallucination ## Installation -```shell + +- Install from Pypi + +```bash +pip install opea-eval +``` + +- Build from Source + +```bash git clone https://github.com/opea-project/GenAIEval cd GenAIEval pip install -e . ``` + ## Evaluation ### lm-evaluation-harness For evaluating the models on text-generation tasks, we follow the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/) and provide the command line usage and function call usage. Over 60 standard academic benchmarks for LLMs, with hundreds of [subtasks and variants](https://github.com/EleutherAI/lm-evaluation-harness/tree/v0.4.2/lm_eval/tasks) implemented, such as `ARC`, `HellaSwag`, `MMLU`, `TruthfulQA`, `Winogrande`, `GSM8K` and so on.