awesome-llm-tool-learning

A list of awesome papers on LLM tool learning.

Preliminary

ReAct: Synergizing Reasoning and Acting in Language Models [ICLR 2023][Code]

RRHF: Rank Responses to Align Language Models with Human Feedback without tears [NeurIPS 2023][Code]

Extending Context Window of Large Language Models via Positional Interpolation [Arxiv 2023][Code]

Survey

Tool Learning with Foundation Models [Arxiv][Code]

Papers

2023

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face [Arxiv][Code]

ART: Automatic multi-step reasoning and tool-use for large language models [Arxiv][Code]

Gorilla: Large Language Model Connected with Massive APIs [Arxiv][Code]

TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs [Arxiv][Code]

Large Language Models as Tool Makers [Arxiv][Code]

MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting [ACL 2023][Code]

Gentopia.AI: A Collaborative Platform for Tool-Augmented LLMs [EMNLP 2023][Code]

CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models [EMNLP Findings 2023][Code]

On the Tool Manipulation Capability of Open-source Large Language Models [Arxiv][Code]

Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models [Arxiv]

Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models [NeurIPS 2023][Code]

Toolformer: Language Models Can Teach Themselves to Use Tools [NeurIPS 2023][Code]

GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction [NeurIPS 2023][Code]

Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning [NeurIPS 2023][Code

ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings [NeurIPS 2023][Code]

TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage [NeurIPS 2023 Workshop]

Making Language Models Better Tool Learners with Execution Feedback [Arxiv][Code]

ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases [Arxiv][Code]

Confucius: Iterative Tool Learning from Introspection Feedback by Easy-to-Difficult Curriculum [AAAI 2024][Code]

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs [ICLR 2024][Code]

CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets [ICLR 2024][Code]

ToolDec: Syntax Error-Free and Generalizable Tool Use for LLMs via Finite-State Decoding [Arxiv][Code]

Identifying the Risks of LM Agents with an LM-Emulated Sandbox [Arxiv][Code]

ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search [Arxiv]

Tool-Augmented Reward Modeling [Arxiv]

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving [Arxiv][Code]

CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing [Arxiv][Code]

RestGPT: Connecting Large Language Models with Real-World RESTful APIs [Arxiv]

Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use [Arxiv]

ControlLLM: Augment Language Models with Tools by Searching on Graphs [Arxiv][Code]

GEAR: Augmenting Language Models with Generalizable and Efficient Tool Resolution [Arxiv][Code]

GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension [Arxiv]

AppAgent: Multimodal Agents as Smartphone Users [Arxiv][Code]

VIoTGPT: Learning to Schedule Vision Tools towards Intelligent Video Internet of Things [Arxiv]

Reverse Chain: A Generic-Rule for LLMs to Master Multi-API Planning [Arxiv][Code]

CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update [Arxiv]

FARS: Fsm-Augmentation to Make LLMs Hallucinate the Right APIs [Arxiv]

Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API [Arxiv]

Navigating Uncertainty: Optimizing API Dependency for Hallucination Reduction in Closed-Book Question Answering [Arxiv]

EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction [Arxiv][Code]

Small LLMs Are Weak Tool Learners: A Multi-LLM Agent [Arxiv]

2024

MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning [Arxiv][Code]

Efficient Tool Use with Chain-of-Abstraction Reasoning [Arxiv]

AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls [Arxiv][Code]

ToolRerank: Adaptive and Hierarchy-Aware Reranking for Tool Retrieval [Arxiv][Code]

ToolNet: Connecting Large Language Models with Massive Tools via Tool Graph [Arxiv]

API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs [Arxiv]

TOOLVERIFIER: Generalization to New Tools via Self-Verification [Arxiv][Code]

Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models [Arxiv]

LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error [Arxiv][Code]

Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments [Arxiv][Code]

Equipping Language Models with Tool Use Capability for Tabular Data Analysis in Finance [Arxiv][Code]

Benchmark

(APIBench) Gorilla: Large Language Model Connected with Massive APIs [Arxiv][Code]

API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs [EMNLP][Code]

(ToolBench) ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs [Arxiv][Code]

ToolQA: A Dataset for LLM Question Answering with External Tools [NeurIPS 2023][Code]

MetaTool Benchmark: Deciding Whether to Use Tools and Which to Use [Arxiv][Code]

T-Eval: Evaluating the Tool Utilization Capability Step by Step [Arxiv][Code]

MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback [Arxiv][Code]

ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios [Arxiv][Code]

A Comprehensive Evaluation of Tool-Assisted Generation Strategies [EMNLP Findings 2023]

ToolTalk: Evaluating Tool-Usage in a Conversational Setting [Arxiv][Code]

InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks [Arxiv][Code]

RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning [Arxiv][Code]

Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios [Arxiv][Code]

ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages [Arxiv][Code]

StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models [Arxiv]Code]

m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks [Arxiv][Code]

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

awesome-llm-tool-learning

Preliminary

Survey

Papers

2023

2024

Benchmark

About

Releases

Packages

AngxiaoYue/awesome-llm-tool-learning

Folders and files

Latest commit

History

Repository files navigation

awesome-llm-tool-learning

Preliminary

Survey

Papers

2023

2024

Benchmark

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages