A lightweight benchmarking framework for LLMs.
- Overview
- Key Features
- Installation
- Usage
- Code Structure
- Contributing
- Citation
- License
- Acknowledgments
lightbench is designed to offer both interactive and automated benchmarking for large language models, enabling comprehensive evaluation of code generation and question answering capabilities.
- Human Evaluation: Interactive chat interface.
- Automatic Evaluations: Automated tests for code and text outputs.
- Extensible Architecture: Easy integration of new evaluators and metrics.
- Dependencies:
Ensure you have Python 3.8+ installed. - Setup Environment:
Run the installation script:bash install_dependencies.sh
- Configure Environment:
Create a.env
file with yourOPENAI_API_KEY
,HUGGINGFACE_TOKEN
, andMODEL_NAME
.
-
Interactive Chat:
Runchat.py
to start the chat interface. This will use the model specified byMODEL_NAME
in the.env
file. Below is an example of a chat using Llama-3.2-3B-Instruct, running on a GTX 1080 TI. -
Automated Evaluations:
See examples inexamples.ipynb
.
- api: API definitions and endpoints.
- evaluators: Modules for both code and text evaluation.
- loaders: Tools to load and manage models.
- metric: Available metrics for local and API based models.
Paper. If you refer to the research paper related to this project, please cite:
@inproceedings{naudot2025performance,
author = {Filip Naudot},
title = {Performance and Computational Demands of LLMs: Impact of Model Size and Quantization},
booktitle = {Proceedings of Umeå’s 28th Student Conference in Computing Science (USCCS 2025)},
editor = {Thomas Hellström},
year = {2025},
publisher = {Umeå University, Sweden},
note = {Branch \texttt{conf-paper} used for paper results},
}
Repository. If you use lightbench in your research, please cite the repository:
@misc{lightbench2025,
author = {Filip Naudot},
title = {lightbench},
year = {2025},
howpublished = {\url{https://github.com/filipnaudot/lightbench}},
}
Distributed under the MIT License. See LICENSE
for more information.