Skip to content

qinyiwei/InfoBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

InfoBench

Citation

@article{qin2024infobench,
      title={InFoBench: Evaluating Instruction Following Ability in Large Language Models}, 
      author={Yiwei Qin and Kaiqiang Song and Yebowen Hu and Wenlin Yao and Sangwoo Cho and Xiaoyang Wang and Xuansheng Wu and Fei Liu and Pengfei Liu and Dong Yu},
      year={2024},
      eprint={2401.03601},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Evaluation with InFoBench

Step1: Dataset Usage

You can directly download it with huggingface datasets.

from datasets import load_dataset

dataset = load_dataset("kqsong/InFoBench")

Step2: Generating the response

Provide an output file in model/output.json. Each data entry should be a json object with a newline, containing all the fields in the input format. The generated response should be included in the json object with the new field named output.

We suggest using greedy decoding to avoid the randomness of decoding.

Step3: Evaluation

Evaluate LLM's outputs on decomposed questions. Using GPT-4-0314 by default in this research.

python evaluation.py \
  --api_key <OPENAI KEY> \
  --eval_model gpt-4-0314 \
  --input model/output.json \
  --output_dir evaluation/ \
  --temperature 0

Each data entry will include an "eval" key in the format of List[bool] which represents "Yes" or "No" answers to each decomposed question. The final output evaluation file will be saved in JSON format at location <output_dir>/<eval_model>/.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages