mmlu_en

MMLU Inference Script

This project tests the related model effects on the MMLU dataset, which includes a validation set and test set containing 1.5K and 14.1K multiple-choice questions across 57 subjects respectively. Below is an introduction to the prediction method for the MMLU dataset.

Data Preparation

Download the evaluation dataset from the MMLU official specified path and unzip it to the data folder:

wget https://people.eecs.berkeley.edu/~hendrycks/data.tar
tar xf data.tar

Run the Prediction Script

Execute the following script:

model_path=path/to/llama-3-chinese
output_path=path/to/your_output_dir
data_path=path/to/mmlu-data

cd scripts/mmlu
python eval.py \
    --model_path ${model_path} \
    --data_dir ${data_path} \
    --save_dir ${output_path} \
    --ntrain 5 \
    --use_flash_attention_2 \

Parameter Explanation

model_path: Directory where the evaluation model is located (complete Llama-3-Chinese or Llama-3-Chinese-Instruct model, not LoRA)
data_dir: Directory containing the evaluation dataset
ntrain: Specifies the number of few-shot instances (5-shot: ntrain=5, 0-shot: ntrain=0)
save_dir: Directory to store the evaluation results
do_test: Test on the valid or test set: when do_test=False, test on the valid set; when do_test=True, test on the test set
load_in_4bit: Load the model in 4bit quantized form if there is insufficient VRAM
use_flash_attention_2: Use Flash-Attention 2 for accelerated inference, otherwise use SDPA for acceleration

Evaluation Output

After the model completes its predictions, the last line of the output log will display the final score: Average accuracy: 0.651. The directory save_dir/results will store the decoding results for each subject.

中文文档

English Docs

Model Reconstruction
Model Quantization, Inference and Deployment
System Performance
Training Scripts
- Pre-training Scripts
- Instruction Fine-tuning Scripts
FAQ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly