Method

Excellent IF (Instruction Following) capabilities are the foundation for building complex applications (such as Tool Usage or Multi-Agent System) based on LLMs. This repository aims to provide a comprehensive list of papers, repositories, and other resources related to improving, evaluating, benchmarking, and theoretically analyzing instruction-following capabilities, in order to advance research in this field.

The repository is still under active construction, and we welcome everyone to collaborate and contribute!

Method

DO LLMS “KNOW” INTERNALLY WHEN THEY FOLLOW INSTRUCTIONS?
- Cambridge, Apple
- In submission to ICLR 2025
SELF-PLAY WITH EXECUTION FEEDBACK: IMPROVING INSTRUCTION-FOLLOWING CAPABILITIES OF LARGE LANGUAGE MODELS
- Alibaba
- AutoIF
LESS: Selecting Influential Data for Targeted Instruction Tuning
- Princeton University, University of Washington
- ICML 2024
- LESS
WizardLM: Empowering Large Language Models to Follow Complex Instructions
- Microsoft, Peking University
- ICLR 2024
- WizardLM
Chain-of-Instructions: Compositional Instruction Tuning on Large Language Models
- University of Minnesota, Amazon AGI, Grammarly
Instruction Pre-Training: Language Models are Supervised Multitask Learners
- Microsoft Research, Tsinghua University
- LMOps

Evaluation

Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
- Stanford University, Independent Researcher
- alpaca_eval
INFOBENCH: Evaluating Instruction Following Ability in Large Language Models
- Tencent AI Lab, Seattle; University of Central Florida; Emory University; University of Georgia; Shanghai Jiao Tong University
- InfoBench
STRUC-BENCH: Are Large Language Models Good at Generating Complex Structured Tabular Data?
- Yale University, Zhejiang University, New York University
- NAACL 2024
- Struc-Bench
FOFO: A Benchmark to Evaluate LLMs’ Format-Following Capability
- Salesforce Research, University of Illinois at Chicago, Pennsylvania State University
- FoFo
AlignBench: Benchmarking Chinese Alignment of Large Language Models
- Tsinghua University, Zhipu AI, Renmin University of China, Sichuan University, Lehigh University
- AlignBench
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
- UC Berkeley, UC San Diego, Carnegie Mellon University, Stanford, MBZUAI
- NeuralPS 2023
- llm_judge
Benchmarking Complex Instruction-Following with Multiple Constraints Composition
- Tsinghua, Zhipu, China University of Geosciences, Central China Normal University
- ComplexBench
EVALUATING LARGE LANGUAGE MODELS AT EVALUATING INSTRUCTION FOLLOWING
- Tsinghua, Princeton, UIUC
- ICLR 2024
- LLMBar
Instruction-Following Evaluation for Large Language Models
- Google, Yale
- instruction_following_eval
FollowEval: A Multi-Dimensional Benchmark for Assessing the Instruction-Following Capability of Large Language Models
- Lenovo, TJU
Can Large Language Models Understand Real-World Complex Instructions?
- Fudan, ECNU
- AAAI 2024
- CELLO
FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models
- HKUST, Huawei
- ACL 2024
- FollowBench
Evaluating Large Language Models on Controlled Generation Tasks
- USC, UC, ETH, Amazon, Deepmind
- llm-controlgen

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
README.md		README.md
logo.png		logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Method

Evaluation

Contributors

About

Releases

Packages

thinkwee/Awesome-LLM-IF

Folders and files

Latest commit

History

Repository files navigation

Method

Evaluation

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages