-
Notifications
You must be signed in to change notification settings - Fork 158
mmlu_en
ymcui edited this page Apr 26, 2024
·
2 revisions
This project tests the related model effects on the MMLU dataset, which includes a validation set and test set containing 1.5K and 14.1K multiple-choice questions across 57 subjects respectively. Below is an introduction to the prediction method for the MMLU dataset.
Download the evaluation dataset from the MMLU official specified path and unzip it to the data
folder:
wget https://people.eecs.berkeley.edu/~hendrycks/data.tar
tar xf data.tar
Execute the following script:
model_path=path/to/llama-3-chinese
output_path=path/to/your_output_dir
data_path=path/to/mmlu-data
cd scripts/mmlu
python eval.py \
--model_path ${model_path} \
--data_dir ${data_path} \
--save_dir ${output_path} \
--ntrain 5 \
--use_flash_attention_2 \
-
model_path
: Directory where the evaluation model is located (complete Llama-3-Chinese or Llama-3-Chinese-Instruct model, not LoRA) -
data_dir
: Directory containing the evaluation dataset -
ntrain
: Specifies the number of few-shot instances (5-shot:ntrain=5
, 0-shot:ntrain=0
) -
save_dir
: Directory to store the evaluation results -
do_test
: Test on the valid or test set: whendo_test=False
, test on the valid set; whendo_test=True
, test on the test set -
load_in_4bit
: Load the model in 4bit quantized form if there is insufficient VRAM -
use_flash_attention_2
: Use Flash-Attention 2 for accelerated inference, otherwise use SDPA for acceleration
After the model completes its predictions, the last line of the output log will display the final score: Average accuracy: 0.651
. The directory save_dir/results
will store the decoding results for each subject.
- Model Reconstruction
- Model Quantization, Inference and Deployment
- System Performance
- Training Scripts
- FAQ