-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: 1049 add standard evaluation benchmarks to lfai evals #1078
feat: 1049 add standard evaluation benchmarks to lfai evals #1078
Conversation
✅ Deploy Preview for leapfrogai-docs canceled.
|
…d-evaluation-benchmarks-to-lfai-evals
MMLU on my desktop (synthia 7b)
|
HumanEval on vllm on desktop: By default, we'll run 3 samples per task for 50 tasks to balance runtime.
|
With these new changes under default settings, the whole evaluation suite should take about 30 minutes to run. |
…d-evaluation-benchmarks-to-lfai-evals
…d-evaluation-benchmarks-to-lfai-evals
…d-evaluation-benchmarks-to-lfai-evals
Most recent results run on default settings
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the refactor - thank you!
Description
Adds baseline generation evaluation benchmarks to provide a known point of comparison. These benchmarks can show us when models we have implemented are not performing how we expect them to (based on reported eval benchmarks) and to better understand the impact of specific quantizations on certain models.
BREAKING CHANGES
N/A
CHANGES
leapfrogai_evals
v1.3.0
Related Issue
Relates to #1049
Checklist before merging