Excellent IF (Instruction Following) capabilities are the foundation for building complex applications (such as Tool Usage or Multi-Agent System) based on LLMs. This repository aims to provide a comprehensive list of papers, repositories, and other resources related to improving, evaluating, benchmarking, and theoretically analyzing instruction-following capabilities, in order to advance research in this field.
The repository is still under active construction, and we welcome everyone to collaborate and contribute!
- DO LLMS “KNOW” INTERNALLY WHEN THEY FOLLOW INSTRUCTIONS?
- Cambridge, Apple
- In submission to ICLR 2025
- SELF-PLAY WITH EXECUTION FEEDBACK: IMPROVING INSTRUCTION-FOLLOWING CAPABILITIES OF LARGE LANGUAGE MODELS
- Alibaba
- AutoIF
- LESS: Selecting Influential Data for Targeted Instruction Tuning
- Princeton University, University of Washington
- ICML 2024
- LESS
- WizardLM: Empowering Large Language Models to Follow Complex Instructions
- Microsoft, Peking University
- ICLR 2024
- WizardLM
- Chain-of-Instructions: Compositional Instruction Tuning on Large Language Models
- University of Minnesota, Amazon AGI, Grammarly
- Instruction Pre-Training: Language Models are Supervised Multitask Learners
- Microsoft Research, Tsinghua University
- LMOps
- Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
- Stanford University, Independent Researcher
- alpaca_eval
- INFOBENCH: Evaluating Instruction Following Ability in Large Language Models
- Tencent AI Lab, Seattle; University of Central Florida; Emory University; University of Georgia; Shanghai Jiao Tong University
- InfoBench
- STRUC-BENCH: Are Large Language Models Good at Generating Complex Structured Tabular Data?
- Yale University, Zhejiang University, New York University
- NAACL 2024
- Struc-Bench
- FOFO: A Benchmark to Evaluate LLMs’ Format-Following Capability
- Salesforce Research, University of Illinois at Chicago, Pennsylvania State University
- FoFo
- AlignBench: Benchmarking Chinese Alignment of Large Language Models
- Tsinghua University, Zhipu AI, Renmin University of China, Sichuan University, Lehigh University
- AlignBench
- Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
- UC Berkeley, UC San Diego, Carnegie Mellon University, Stanford, MBZUAI
- NeuralPS 2023
- llm_judge
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition
- Tsinghua, Zhipu, China University of Geosciences, Central China Normal University
- ComplexBench
- EVALUATING LARGE LANGUAGE MODELS AT EVALUATING INSTRUCTION FOLLOWING
- Tsinghua, Princeton, UIUC
- ICLR 2024
- LLMBar
- Instruction-Following Evaluation for Large Language Models
- Google, Yale
- instruction_following_eval
- FollowEval: A Multi-Dimensional Benchmark for Assessing the Instruction-Following Capability of Large Language Models
- Lenovo, TJU
- Can Large Language Models Understand Real-World Complex Instructions?
- Fudan, ECNU
- AAAI 2024
- CELLO
- FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models
- HKUST, Huawei
- ACL 2024
- FollowBench
- Evaluating Large Language Models on Controlled Generation Tasks
- USC, UC, ETH, Amazon, Deepmind
- llm-controlgen