Tool usage research #607

simonw · 2024-11-05T04:04:55Z

I'm starting this research thread to drop in examples of tool usage across different LLMs, to help inform a llm feature for that.

The text was updated successfully, but these errors were encountered:

simonw · 2024-11-05T04:05:47Z

For each one I'm going to try a tool for searching my own blog (I may add a QuickJS code execution tool in the future.)

simonw · 2024-11-05T04:09:28Z

import httpx


def blog_search(query):
    url = "https://datasette.simonwillison.net/simonwillisonblog.json"
    args = {
        "sql": """
        select
            blog_blogmark.id,
            blog_blogmark.link_url,
            blog_blogmark.link_title,
            blog_blogmark.commentary,
            blog_blogmark.created,
            blog_blogmark_fts.rank
        from
            blog_blogmark join blog_blogmark_fts
            on blog_blogmark.rowid = blog_blogmark_fts.rowid
        where
            blog_blogmark_fts match escape_fts(:search)
        order by
            rank
        limit
            5
    """,
        "_shape": "array",
        "search": query,
    }
    return httpx.get(url, params=args).json()

simonw · 2024-11-05T04:14:50Z

First, OpenAI: https://platform.openai.com/docs/guides/function-calling

import json
import llm
import openai

client = openai.OpenAI(api_key=llm.get_key('', 'openai')

tools = [
    {
        "type": "function",
        "function": {
            "name": "search_blog",
            "description": "Search for posts on the blog.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Search query",
                    }
                },
                "required": ["query"],
                "additionalProperties": False,
            },
        },
    }
]

messages = []
messages.append(
    {"role": "user", "content": "Hi, what do you know about anthropic?"}
)
response = client.chat.completions.create(
    model="gpt-4o", messages=messages, tools=tools
)

Run the search, then:

function_call_result_message = {
    "role": "tool",
    "content": json.dumps(results),
    "tool_call_id": response.choices[0].message.tool_calls[0].id,
}
messages.append(response.choices[0].message.dict())
messages.append(function_call_result_message)
response2 = client.chat.completions.create(
    model="gpt-4o", messages=messages, tools=tools
)
print(response2.choices[0].message.content)

Lining up the tool_call_id is important or you get an error.

simonw · 2024-11-05T04:31:38Z

Anthropic: https://docs.anthropic.com/en/docs/build-with-claude/tool-use and https://github.com/anthropics/courses/blob/master/tool_use/04_complete_workflow.ipynb

Anthropic tools look similar to OpenAI ones, but instead of parameters use input_schema and they don't nest inside an outer object. So #607 (comment) looks like this instead:

anthropic_tool = {
    "name": "search_blog",
    "description": "Search for posts on the blog.",
    "input_schema": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "Search query",
            }
        },
        "required": ["query"],
        "additionalProperties": False,
    },
}

import anthropic

anthropic_client = anthropic.Anthropic(api_key=llm.get_key("", "claude"))

messages = [{"role": "user", "content": "Tell me about pelicans"}]

response = anthropic_client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    tools=[anthropic_tool],
    messages=messages,
)

Message(
    id="msg_01FuLBrwiQih4aY4WSxAXGnj",
    content=[
        TextBlock(text="I'll search the blog for posts about pelicans.", type="text"),
        ToolUseBlock(
            id="toolu_01YSaFvtW3mjbrg8hSGH7FkZ",
            input={"query": "pelicans"},
            name="search_blog",
            type="tool_use",
        ),
    ],
    model="claude-3-5-sonnet-20241022",
    role="assistant",
    stop_reason="tool_use",
    stop_sequence=None,
    type="message",
    usage=Usage(input_tokens=394, output_tokens=69),
)

And now:

messages.append({
    "role": "assistant",
    "content": [r.dict() for r in response.content]
})

results = blog_search(response.content[-1].input["query"])

tool_response = {
    "role": "user",
    "content": [
        {
        "type": "tool_result",
        "tool_use_id": response.content[-1].id,
        "content": json.dumps(results)
        }
    ]
}

messages.append(tool_response)

response2 = anthropic_client.messages.create(
    model="claude-3-sonnet-20240229",
    messages=messages,
    max_tokens=1024,
    tools=[anthropic_tool]
)
print(response2.content[0].text)

The blog posts provide some interesting information about pelicans:

Pelicans are a group of aquatic birds known for their distinctive pouch-like bills that they use to scoop up fish and other prey from the water.

Common species include the Brown Pelican found in the Americas that plunges into the water from heights to catch fish, and the American White Pelican with white plumage and a large pink bill found in the Americas and Eurasia.

The Brown Pelican is one of the largest pelican species, with an average height around 26-30 inches and bills up to 11 inches long.

One blog featured creative AI-generated images and descriptions of pelicans riding bicycles by different language models as a novel benchmark test.

Another blog highlighted using an AI to have pelican personas discuss topics like data journalism and video analysis in a pelican newscaster style.

There is work on compact but capable language models like SmolLM2 that can discuss topics like pelicans while running efficiently on devices.

So in summary, the blogs cover pelican biology, using pelican prompts to creatively test language models, and developing efficient on-device models that can still discuss pelicans knowledgeably. Let me know if you need any other details!

simonw · 2024-11-05T04:40:46Z

Gemini calls it function calling: https://ai.google.dev/gemini-api/docs/function-calling and https://ai.google.dev/gemini-api/docs/function-calling/tutorial (which has curl examples).

Here's a useful curl example from that page: https://ai.google.dev/gemini-api/docs/function-calling

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent?key=$GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '
{
    "contents": [
        {
            "role": "user",
            "parts": [
                {
                    "text": "Which theaters in Mountain View show Barbie movie?"
                }
            ]
        },
        {
            "role": "model",
            "parts": [
                {
                    "functionCall": {
                        "name": "find_theaters",
                        "args": {
                            "location": "Mountain View, CA",
                            "movie": "Barbie"
                        }
                    }
                }
            ]
        },
        {
            "role": "user",
            "parts": [
                {
                    "functionResponse": {
                        "name": "find_theaters",
                        "response": {
                            "name": "find_theaters",
                            "content": {
                                "movie": "Barbie",
                                "theaters": [
                                    {
                                        "name": "AMC Mountain View 16",
                                        "address": "2000 W El Camino Real, Mountain View, CA 94040"
                                    },
                                    {
                                        "name": "Regal Edwards 14",
                                        "address": "245 Castro St, Mountain View, CA 94040"
                                    }
                                ]
                            }
                        }
                    }
                }
            ]
        }
    ],
    "tools": [
        {
            "functionDeclarations": [
                {
                    "name": "find_movies",
                    "description": "find movie titles currently playing in theaters based on any description, genre, title words, etc.",
                    "parameters": {
                        "type": "OBJECT",
                        "properties": {
                            "location": {
                                "type": "STRING",
                                "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616"
                            },
                            "description": {
                                "type": "STRING",
                                "description": "Any kind of description including category or genre, title words, attributes, etc."
                            }
                        },
                        "required": [
                            "description"
                        ]
                    }
                },
                {
                    "name": "find_theaters",
                    "description": "find theaters based on location and optionally movie title which is currently playing in theaters",
                    "parameters": {
                        "type": "OBJECT",
                        "properties": {
                            "location": {
                                "type": "STRING",
                                "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616"
                            },
                            "movie": {
                                "type": "STRING",
                                "description": "Any movie title"
                            }
                        },
                        "required": [
                            "location"
                        ]
                    }
                },
                {
                    "name": "get_showtimes",
                    "description": "Find the start times for movies playing in a specific theater",
                    "parameters": {
                        "type": "OBJECT",
                        "properties": {
                            "location": {
                                "type": "STRING",
                                "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616"
                            },
                            "movie": {
                                "type": "STRING",
                                "description": "Any movie title"
                            },
                            "theater": {
                                "type": "STRING",
                                "description": "Name of the theater"
                            },
                            "date": {
                                "type": "STRING",
                                "description": "Date for requested showtime"
                            }
                        },
                        "required": [
                            "location",
                            "movie",
                            "theater",
                            "date"
                        ]
                    }
                }
            ]
        }
    ]
}
'

I got back:

{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": "OK. I found two theaters in Mountain View showing Barbie: AMC Mountain View 16 and Regal Edwards 14."
          }
        ],
        "role": "model"
      },
      "finishReason": "STOP",
      "index": 0,
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE"
        }
      ]
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 448,
    "candidatesTokenCount": 25,
    "totalTokenCount": 473
  },
  "modelVersion": "gemini-pro"
}

simonw · 2024-11-05T05:01:22Z

Decided to see if I could figure it out for llama-cpp-python and llm-gguf. Managed to get this working:

llm -m Hermes-3-Llama-3.1-8B 'tell me what the blog says about pelicans' --no-stream -s 'you derive keywords from questions and search for them'

And these code changes:

         if self._model is None:
-            self._model = Llama(
-                model_path=self.model_path, verbose=False, n_ctx=0  # "0 = from model"
-            )

+            self._model = Llama(
+                model_path=self.model_path, verbose=False, n_ctx=self.n_ctx,
+                chat_format="chatml-function-calling"
+            )


@@ -171,7 +221,27 @@ class GgufChatModel(llm.Model):
 
         if not stream:
             model = self.get_model()
-            completion = model.create_chat_completion(messages=messages)
+            completion = model.create_chat_completion(messages=messages, tools=[
+                {
+                    "type": "function",
+                    "function": {
+                        "name": "search_blog",
+                        "description": "Search for posts on the blog.",
+                        "parameters": {
+                            "type": "object",
+                            "properties": {
+                                "query": {
+                                    "type": "string",
+                                    "description": "Search query, keywords only",
+                                }
+                            },
+                            "required": ["query"],
+                            "additionalProperties": False,
+                        },
+                    }
+                }
+            ], tool_choice="auto")
+            breakpoint()

Which gave me this for completion at the breakpoint:

{
  "id": "chatcmpl-60256c0d-744d-449a-a433-3faa6224f770",
  "object": "chat.completion",
  "created": 1730782682,
  "model": "/Users/simon/Library/Application Support/io.datasette.llm/gguf/models/Hermes-3-Llama-3.1-8B.Q4_K_M.gguf",
  "choices": [
    {
      "finish_reason": "tool_calls",
      "index": 0,
      "logprobs": null,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call__0_search_blog_cmpl-c4456f81-e402-4f82-ab4d-6063c0b894ff",
            "type": "function",
            "function": {
              "name": "search_blog",
              "arguments": "{\"query\": \"pelicans blog\"}"
            }
          }
        ],
        "function_call": {
          "name": "search_blog:",
          "arguments": "{\"query\": \"pelicans blog\"}"
        }
      }
    }
  ],
  "usage": {
    "completion_tokens": 8,
    "prompt_tokens": 202,
    "total_tokens": 210
  }
}

So it looks like this is feasible, using that chat_format="chatml-function-calling" option. The tool_choice="auto" was necessary too.

simonw · 2024-11-05T05:07:15Z

OK, I now have function calling examples for OpenAI, Anthropic, Gemini and Llama.cpp. That's probably enough.

The most complex examples are the ones that need to persist and then re-send a tool ID (OpenAI and Anthropic).

AtakanTekparmak · 2024-11-05T09:09:29Z

Is this going to be only about the implementation of function calling from major providers or more like a discussion of further methods as well? I have been using function calling in pure python for a while and have experienced a significant performance hike especially in sub 20B models. First started prototyping with this prompt and then settled on the implementation in agento, with this system prompt and this engine implementation, inside the execute_python_code method. An example function call is like below:

             ╭────────────────────────────────────────────────────────╮                                                                                                     
 user        │ Can you get 4 apples, eat 1 of them and sell the rest? │                                                                                                     
             ╰────────────────────────────────────────────────────────╯                                                                                                     
             ╭─────────────────────────────────────────────────╮                                                                                                            
 Apple       │ ```python                                       │                                                                                                            
 Agent       │ apples = get_apples(4)                          │                                                                                                            
             │ apples_after_eating = eat_apples(apples, 1)     │                                                                                                            
             │ money_earned = sell_apples(apples_after_eating) │                                                                                                            
             │ ```                                             │                                                                                                            
             ╰─────────────────────────────────────────────────╯

where get_apples, eat_apples and sell_apples are defined like the following:

def get_apples(quantity: int) -> List[str]:
    """
    Get a certain quantity of apples.

    Args:
        quantity (int): The quantity of apples to get.

    Returns:
        List[str]: A list of apples.
    """
    return ["Apple" for _ in range(quantity)]

def eat_apples(apples: List[str], quantity: int) -> List[str]:
    """
    Eat a certain quantity of apples.

    Args:
        apples (List[str]): A list of apples.
        quantity (int): The quantity of apples to eat.

    Returns:
        List[str]: The remaining apples.
    """
    return apples[quantity:] if quantity < len(apples) else []

def sell_apples(apples: List[str]) -> str:
    """
    Sell all the apples provided.

    Args:
        apples (List[str]): A list of apples.

    Returns:
        str: The money earned from selling the apples.
    """
    return f"${len(apples) * 1}"

simonw · 2024-11-05T12:29:13Z

My goal is to expand the model plugin mechanism so that new models can be registered that support tool usage. Ideally this would enable people to write their own plugins that implement tool usage via prompting if they want to.

kennethreitz · 2024-11-05T14:21:41Z

Following

chrisVillanueva · 2024-11-05T14:51:29Z

this is promising. subscribed.

ErikBjare · 2024-11-05T22:11:17Z

In gptme I'm using a tool calling format based on markdown codeblocks in the normal text output.

It predates tool calling APIs, so it works by detecting tool calls as output is streamed and interrupting the stream when a valid tool call was finished.

Example, to save a file hello.txt:

```save hello.txt
Hello world
```

To run ipython, where functions can be registered:

```ipython
search("search query", engine="duckduckgo")
```

You can find the full system prompt and detailed examples here: https://gptme.org/docs/prompts.html

I've also worked on a XML-format of this (ErikBjare/gptme#121), as well as support for the actual tool calling APIs now available via OpenAI, Anthropic, OpenRouter, Ollama (ErikBjare/gptme#219).

simonw · 2024-11-06T14:16:22Z

In gptme I'm using a tool calling format based on markdown codeblocks in the normal text output.

I like that a lot. Feels like the kind of format ssit any capable LLM could be convinced to output, and very easy to parse. I'll think about how that might be supported.

bsima · 2024-11-18T18:55:23Z

Note that ollama supports the same tool calling API as OpenAI: https://ollama.com/blog/tool-support

bsima · 2024-11-18T20:17:04Z

Using the markdown-style format like gptme uses is a great idea, but I think it's also worth supporting the native function calling formats for each model since they are trained on those formats specifically, so we can expect it to work.

My suggestion would be to have a class that abstracts over the model-native formats, each model can implement their own to_json method, and llm provides a 'to_markdown' method as a backup.

Here's some scratch code of what I would suggest:

class ToolInterface(ABC):
    def call(): pass
    def doc(): pass

class MyTool(ToolInterface):
    def call():
        # implement the tool logic
    def doc():
        # return a dict that describes the tool

class ModelWithJsonTools:
    def __init__(self, tools, ...):
        self.tools = tools

    def get_tools(self):
        # returns the doc for all the tools in the json format specific to this model
        # format_tool_spec could be a shared function, or specific to the model/plugin
        return json.dumps([format_tool_spec(tool) for tool in self.tools])

a = llm.get_model(ModelWithJsonTools, tools=[MyTool])

class ModelWithMarkdownTools:
    # same as ModelWithJsonTools except:
    def get_tools(self):
        # llm should provide the markdown spec for a consistent format
        return llm.tools.format_markdown_spec(self.tools)

b = llm.get_model(ModelWithMarkdownTools, tools=[MyTool])

The markdown spec that gptme uses could be borrowed, it looks good.

Maybe this functionality should be implemented in a plugin like llm-tools, is that possible or advisable?

simonw added the research label Nov 5, 2024

lsorber mentioned this issue Nov 5, 2024

Add function calling support superlinear-ai/raglite#42

Open

daniel-j-h mentioned this issue Dec 14, 2024

Support for Structured Output #675

Open

luebken mentioned this issue Jan 22, 2025

Feature request: ability to use MCP servers #696

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tool usage research #607

Tool usage research #607

simonw commented Nov 5, 2024

simonw commented Nov 5, 2024 •

edited

Loading

simonw commented Nov 5, 2024 •

edited

Loading

simonw commented Nov 5, 2024 •

edited

Loading

simonw commented Nov 5, 2024 •

edited

Loading

simonw commented Nov 5, 2024 •

edited

Loading

simonw commented Nov 5, 2024

simonw commented Nov 5, 2024 •

edited

Loading

AtakanTekparmak commented Nov 5, 2024

simonw commented Nov 5, 2024

kennethreitz commented Nov 5, 2024

chrisVillanueva commented Nov 5, 2024

ErikBjare commented Nov 5, 2024 •

edited

Loading

simonw commented Nov 6, 2024

bsima commented Nov 18, 2024

bsima commented Nov 18, 2024

Tool usage research #607

Tool usage research #607

Comments

simonw commented Nov 5, 2024

simonw commented Nov 5, 2024 • edited Loading

simonw commented Nov 5, 2024 • edited Loading

simonw commented Nov 5, 2024 • edited Loading

simonw commented Nov 5, 2024 • edited Loading

simonw commented Nov 5, 2024 • edited Loading

simonw commented Nov 5, 2024

simonw commented Nov 5, 2024 • edited Loading

AtakanTekparmak commented Nov 5, 2024

simonw commented Nov 5, 2024

kennethreitz commented Nov 5, 2024

chrisVillanueva commented Nov 5, 2024

ErikBjare commented Nov 5, 2024 • edited Loading

simonw commented Nov 6, 2024

bsima commented Nov 18, 2024

bsima commented Nov 18, 2024

simonw commented Nov 5, 2024 •

edited

Loading

simonw commented Nov 5, 2024 •

edited

Loading

simonw commented Nov 5, 2024 •

edited

Loading

simonw commented Nov 5, 2024 •

edited

Loading

simonw commented Nov 5, 2024 •

edited

Loading

simonw commented Nov 5, 2024 •

edited

Loading

ErikBjare commented Nov 5, 2024 •

edited

Loading