Standard function-call data spec for Emacs LLM clients #124

karthink · 2024-12-17T20:24:22Z

karthink
Dec 17, 2024

I'm adding tool-use to gptel and wanted to coordinate with you on the tool definition format. I think it would be good to have a community-maintained bank of commonly useful tool calls that can plug in easily into all Emacs LLM clients. gptel uses a different internal data structure to manage tools from llm, so what do you think of defining tools as loosely-structured plists that we can both use?

I can explain why. Here's an example tool definition that can be read by both llm and gptel:

(:function #'get-weather
 :name "get_weather"
 :description "Get the current weather in a given location"
 :args ((:name "location"
         :type "string"
         :description "The city and state, e.g. San Francisco, CA"
         :required t)
        (:name "unit"
         :type "string"
         :description "The unit of temperature, either 'celsius' or 'fahrenheit'"
         :required nil)))

The repo would contain this piece of data along with an implementation of get-weather. This example is useless, but you can imagine commonly useful tools, like ones that fetch web video or google scholar results, or results from info manuals.

Here's how llm could import this:

(let* ((args (plist-get weather-tool :args)))
  ;; First convert args to `llm-function-arg's
  (plist-put weather-tool :args
             (mapcar (lambda (arg)
                       (apply #'make-llm-function-arg arg))
                     args))
  ;; Convert weather-tool to an `llm-function-call'
  (apply #'make-llm-function-call weather-tool))

gptel can do something similar to convert the data into its internal tool structure.

If you are interested in this idea, we can decide on a plist format. I have two points of feedback on the current implementation of tool definitions in llm, one minor and one major:

(Minor, aesthetic) I think the :required key can be inverted to :optional, with a default value of nil. This way defining an argument works like in emacs-lisp, and :required does not need to be specified, since the shorter declaration:

(make-llm-function-arg
 :name "location"
 :type "string"
 :description "The city and state, e.g. San Francisco, CA")

will imply that it's a required argument, and

(make-llm-function-arg
 :name "unit"
 :type "string"
 :description "The unit of temperature, either 'celsius' or 'fahrenheit'"
 :optional t)

explicitly specifies that it's optional, like &optional in an elisp function. I would expect optional arguments to be rarer across tool definitions than required ones.

(Major) I think specifying the fields that args can have is tricky, and it doesn't make sense to restrict them. In the above example, the arg definition in the OpenAI and Anthropic documentation actually involves an :enum field:

:args
((:name "location"
  :type "string"
  :description "The city and state, e.g. San Francisco, CA"
  :required t)
 (:name "unit"
  :type "string"
  :enum ["celsius" "farenheit"]  ;<---- THIS RIGHT HERE
  :description "The unit of temperature, either 'celsius' or 'fahrenheit'"
  :required nil))

The :enum field is currently not allowed by make-llm-function-arg. I don't know what fields the full JSON schema allows here, but I'm guessing restricting them in make-llm-function-arg might cause issues. In gptel I'm currently just using a plist for the function arg spec.

ahyatt · 2024-12-18T04:35:12Z

ahyatt
Dec 18, 2024
Maintainer

Thanks, it's interesting that you posted this at this time. I've been thinking of moving away from my current implementation for function calling, and making an API-breaking change. First of all, I wrote it when it was still called "function calling", but now everyone calls it "tool use", so my naming is confusing. I also think the way I did it requires too much struct-building. Your note about :required is also a good observation. Enums, btw, we can do with :type (enum "foo" "bar").

I've recently added json mode, which also uses JSON schema, but with some differences in how it's used than tool calling, and it ends up looking a bit more like what you have proposed. An example, taken from my integration tests is:

(llm-chat
                   provider
                   (llm-make-chat-prompt
                    "List the 3 largest cities in France in order of population, giving the results in JSON."
                    :response-format
                    '(:type object
                            :properties
                            (:cities (:type array :items (:type string)))
                            :required (cities))))

This isn't released yet, but I think it's a lighter-weight and easier to read way to specify schema. I could change it to some other format before release, so now's a good time to change it to something that would work well for both it and using the same structure for tool use.

One question is how much you want to support the JSON schema - for example, can arguments be more than just strings, integers, etc, and be objects that have their own structure?

0 replies

karthink · 2024-12-18T04:55:55Z

karthink
Dec 18, 2024
Author

I've been thinking of moving away from my current implementation for function calling, and making an API-breaking change. First of all, I wrote it when it was still called "function calling", but now everyone calls it "tool use", so my naming is confusing.

Are you planning to remove the tool-use feature entirely and implement it via :response-format instead? If so, I'm not sure how that would work. I'm assuming you mentioned :response-format primarily to highlight the new schema you're considering.

I also think the way I did it requires too much struct-building. Your note about :required is also a good observation.

I think the top level struct for a tool/function-call spec makes sense, that API is stable. It was only the component struct (like args) that I thought might be too constraining.

Enums, btw, we can do with :type (enum "foo" "bar").

Not quite, because the examples in the OpenAI/Anthropic API are using both :type and :enum as top-level keys:

(:name "unit"
  :type "string"
  :enum ["celsius" "farenheit"]
  :description "The unit of temperature, either 'celsius' or 'fahrenheit'"
  :required nil)

I've recently added json mode, which also uses JSON schema, but with some differences in how it's used than tool calling, and it ends up looking a bit more like what you have proposed. An example, taken from my integration tests is:

(llm-chat
 provider
 (llm-make-chat-prompt
  "List the 3 largest cities in France in order of population, giving the results in JSON."
  :response-format
  '(:type object
    :properties
    (:cities (:type array :items (:type string)))
    :required (cities))))

This isn't released yet, but I think it's a lighter-weight and easier to read way to specify schema. I could change it to some other format before release, so now's a good time to change it to something that would work well for both it and using the same structure for tool use.

This method looks good for arbitrary JSON. Tool-use requires a more constrained schema, so a struct actually makes sense there.

One question is how much you want to support the JSON schema - for example, can arguments be more than just strings, integers, etc, and be objects that have their own structure?

How would you communicate composite types to the API?

1 reply

ahyatt Dec 18, 2024
Maintainer

I'm not planning to remove tool use completely, just rename it and change how it is specified, to be more similar to what I've done for the json-mode.

BTW, the enum thing does work, although you are right in that it is constrained to strings:

(llm-chat ash/llm-gemini
          (llm-make-chat-prompt
           "What is weather in Prague in Celsius?"
           :functions
           (list (make-llm-function-call
                  :function (lambda (location unit) (format "Call to lookup weather for %s in %s" location unit))
                  :name "weather_lookup"
                  :description "Return a description of the weather in a given location."
                  :args (list (make-llm-function-arg
                               :name "location"
                               :description "The location to look up the weather for."
                               :type 'string
                               :required t)
                              (make-llm-function-arg
                               :name "unit"
                               :description "The unit to return the temperature in."
                               :type '(enum "Celsius" "Fahrenheit")
                               :required t))))))

gives

(("weather_lookup" . "Call to lookup weather for Prague in Celsius"))

This was a bit of a simplification on my part, but personally non-string enums don't make a lot of sense to me.

About communicating custom types, the JSON schema permits nesting objects, so you have the normal top-level object containing the args, but one of those args can be an object as well, with its own typed fields. I haven't implemented that, and I'm not sure how well the LLMs handle this, but it is possible to express in the tool use APIs, which use the JSON schema spec.

karthink · 2024-12-18T06:05:27Z

karthink
Dec 18, 2024
Author

I'm not planning to remove tool use completely, just rename it and change how it is specified, to be more similar to what I've done for the json-mode.

Could you let me know when you decide on a schema for specifying tools? I can try to stay close to it so we can share tools between LLM clients in the future.

This was a bit of a simplification on my part, but personally non-string enums don't make a lot of sense to me.

I agree.

1 reply

ahyatt Dec 18, 2024
Maintainer

SG, let me work on this weekend. I'm not sure what your timeframe is, but I'll work on something and share here before I merge anything in, so we can make sure we agree on it.

karthink · 2024-12-19T18:13:22Z

karthink
Dec 19, 2024
Author

@ahyatt, how do you handle async tool-use? I've settled on a rather clumsy API and was wondering if you have a better solution:

A synchronous tool-use function is defined as:

(make-tool
 :function #'foo
 :description "Return ..."
 :args '((:name "arg1"
          :description "..."
          :type "string")))

and an async one as

(make-tool
 :function #'bar
 :description "Return ..."
 :args '((:name "arg1"
          :description "..."
          :type "string"))
 :async t)

The synchronous tool is called as (foo arg1), while the asynchronous tool is called as (bar cb arg1). When the result is ready, bar must run the provided callback: (funcall cb result).

On a related note, how do you handle the difference between tools whose return value should be fed back to the LLM, and tools run for side-effects or not run at all because the LLM's tool call JSON is all that was needed?

3 replies

ahyatt Dec 20, 2024
Maintainer

In the llm library we use a paradigm where async calls get a success callback. When it's a normal text LLM call, that callback gets text. If it is a function call, it gets an alist of the function name and the result of the function call. In other words, we separate out the callback part with the function calling part, basically running the function then calling the callback. It looks like you are combining them together. Both could work, but I think in your case if the client wanted a callback when #'bar is called, they could just do that themselves, so do you even need the callback?

karthink Dec 20, 2024
Author

In the llm library we use a paradigm where async calls get a success callback. When it's a normal text LLM call, that callback gets text. If it is a function call, it gets an alist of the function name and the result of the function call.

How do you handle the case where the API produces both text and a function call in a single response? (Anthropic models do this.)

In other words, we separate out the callback part with the function calling part, basically running the function then calling the callback.

That's pretty clean.

It looks like you are combining them together. Both could work, but I think in your case if the client wanted a callback when #'bar is called, they could just do that themselves, so do you even need the callback?

It's not the client that needs the callback to #'bar, it's gptel. Consider the get_weather tool example above. Suppose this is an asynchronous web request. For it to be useful, it has to take an additional callback argument:

(defun get-weather (callback location &optional unit)
  (url-retrieve "https://api.weather-service.com/..."
                (lambda (_)
                  (let ((weather (parse buffer)))
                    (funcall callback weather)))))

So gptel/llm will have to provide callback here, right? For that gptel/llm needs to know if get_weather is a synchronous or asynchronous tool. Should it be called as (get-weather location unit), or as (get-weather callback location unit)?

IIUC, in the llm API the "transaction" is complete when you call the success callback. How do you feed the tool result back to the LLM and continue the exchange (if required)? I'm guessing the client needs to make another request?

In gptel this is flipped -- the result is fed back to the LLM unless it's indicated that this is not required.

ahyatt Dec 20, 2024
Maintainer

Thanks for the explanation; yes - your method is actually better for the cases in which the function that is getting called needs to itself by async. llm doesn't have a good way to handle that yet. We just wait until the function is finished, then update the prompt so that the next time the user calls (they have to re-use the same prompt struct) it has the right information.

As to how we handle both text & functions for Claude, we append the text to the prompt, after the function call results, so that the information is there for when the client calls Claude again.

But I may be forgetting some detail here, the whole dance you have to do with tool use in conversations is complicated, under-documented, and has several non-standard in different providers. I think automatically feeding it back like you will be doing would be reasonable, but I think most of the time it isn't needed. If I rethink the tool use interface this weekend, I'll consider adopting your way.

karthink · 2024-12-21T02:21:33Z

karthink
Dec 21, 2024
Author

Thanks for the explanation; yes - your method is actually better for the cases in which the function that is getting called needs to itself by async. `llm` doesn't have a good way to handle that yet. We just wait until the function is finished, then update the prompt so that the next time the user calls (they have to re-use the same prompt struct) it has the right information.

I decided at the start that gptel will never block Emacs, and have since paid dearly with my time to uphold that principle!

As to how we handle both text & functions for Claude, we append the text to the prompt, after the function call results, so that the information is there for when the client calls Claude again.

Cool. In gptel I'm calling the callback twice, once with the text and again with the tool result.

But I may be forgetting some detail here, the whole dance you have to do with tool use in conversations is complicated, under-documented, and has several non-standard in different providers.

I finished implementing tool use for all the major APIs with and without streaming responses, and it was a big ol' mess. A lot of the demo code online, and even the official API documentation in the case of Gemini, is just flat out wrong. All the idiosyncracies are fresh in my mind at the moment, so let me know if you need help with the details.

I think automatically feeding it back like you will be doing would be reasonable, but I think most of the time it isn't needed. If I rethink the tool use interface this weekend, I'll consider adopting your way.

All right.

1 reply

ahyatt Dec 21, 2024
Maintainer

BTW, about Claude's text & function call dual result, I was a bit surprised to learn of a new wrinkle that just showed up: in Gemini 2.0 Flash Thinking model, it also returns two results at the same time. One is the "thinking" part, and the other is the "result" part. It makes me think that it might be good to return a richer thing from llm calls than just "the result". In at least two cases now, there's "the result" and some sort of extra information that might be useful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standard function-call data spec for Emacs LLM clients #124

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 6 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Standard function-call data spec for Emacs LLM clients #124

karthink Dec 17, 2024

Replies: 5 comments · 6 replies

ahyatt Dec 18, 2024 Maintainer

karthink Dec 18, 2024 Author

ahyatt Dec 18, 2024 Maintainer

karthink Dec 18, 2024 Author

ahyatt Dec 18, 2024 Maintainer

karthink Dec 19, 2024 Author

ahyatt Dec 20, 2024 Maintainer

karthink Dec 20, 2024 Author

ahyatt Dec 20, 2024 Maintainer

karthink Dec 21, 2024 Author

ahyatt Dec 21, 2024 Maintainer

karthink
Dec 17, 2024

Replies: 5 comments 6 replies

ahyatt
Dec 18, 2024
Maintainer

karthink
Dec 18, 2024
Author

ahyatt Dec 18, 2024
Maintainer

karthink
Dec 18, 2024
Author

ahyatt Dec 18, 2024
Maintainer

karthink
Dec 19, 2024
Author

ahyatt Dec 20, 2024
Maintainer

karthink Dec 20, 2024
Author

ahyatt Dec 20, 2024
Maintainer

karthink
Dec 21, 2024
Author

ahyatt Dec 21, 2024
Maintainer