This repository contains datasets to evaluate instruction-tuned models such as Alpaca and Baize adapted to Italian (i.e., Camoscio, Stambecco, and Fauno).
We aim to facilitate simple and convenient benchmarking across multiple tasks and models.
You can use the InstructEval suite to evaluate your custom Instruction-tuned Large Language Model.