feat: implemented use of OpenAI's `tools` API #219

ErikBjare · 2024-10-23T16:26:06Z

Made an attempt at #218, was pretty easy to get this minimal thing working.

get it working for save/append
get it working for patch
get it working for ipython
~~should we register ipython-functions too?~~ leave it to the ipython tool
fix parameter schemas
clean up examples prompting
get it working for anthropic
test it with openrouter/ollama with Llama 1B/3B to see if behavior is improved
- Addressing No tools work with Llama 1B #195

Would improve system prompt adherence, fixing issues like:

Important

Adds support for OpenAI's Tools API, integrating tool usage into chat flow with tool execution and response handling.

Behavior:
- Adds support for OpenAI's Tools API in step() in chat.py and reply() in llm.py.
- Parses tool call responses in step() using regex to extract tool name and arguments.
- Integrates tool execution in chat flow, yielding tool execution results.
Functions:
- Adds build_tools_dict() and spec2tool() in llm.py to construct tool specifications.
- Modifies reply(), _chat_complete(), and _stream() in llm.py to accept and handle tools.
Misc:
- Updates chat() and stream() in llm_openai.py and llm_anthropic.py to pass tools to API calls.
- Adds tools parameter to chat() and stream() functions in llm_openai.py and llm_anthropic.py.
- Updates toolcall_re regex in tools/base.py to parse tool calls.

^{This description was created by}^{for 4a11c8b. It will automatically update as commits are pushed.}

ellipsis-dev

❌ Changes requested. Reviewed everything up to 6bb676b in 40 seconds

More details

Looked at 312 lines of code in 4 files
Skipped 0 files when reviewing.
Skipped posting 3 drafted comments based on config settings.

1. gptme/chat.py:206

Draft comment:
The toolsapi variable is hardcoded to True. Consider making this configurable or based on a condition to allow flexibility in enabling/disabling the tools API.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment is about a change made in the diff, specifically the introduction of the toolsapi variable set to True. Making it configurable could be a valid suggestion for improving code flexibility. The comment is actionable and clear, suggesting a potential improvement in code quality.
The comment assumes that configurability is necessary without knowing the broader context of the application. It might be that the tools API is always intended to be enabled, making the suggestion unnecessary.
Even if the tools API is always intended to be enabled, making it configurable could still be beneficial for future-proofing or testing purposes.
The comment is valid as it suggests an actionable improvement to make the toolsapi variable configurable, enhancing flexibility.

2. gptme/llm_anthropic.py:37

Draft comment:
The tools parameter in the chat and stream functions is not type-annotated. Consider adding a type annotation for clarity and to ensure correct usage.
Reason this comment was not posted:
Confidence changes required: 50%
The tools parameter in the chat and stream functions is not type-annotated, which can lead to confusion about the expected input type.

3. gptme/llm_openai.py:73

Draft comment:
The tools parameter in the chat and stream functions is not type-annotated. Consider adding a type annotation for clarity and to ensure correct usage.
Reason this comment was not posted:
Confidence changes required: 50%
The tools parameter in the chat and stream functions is not type-annotated, which can lead to confusion about the expected input type.

Workflow ID: wflow_ub07SS78hNJ2a08Z

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev · 2024-10-23T16:26:52Z

gptme/chat.py

+            if toolsapi:
+                toolcall_re = re.compile(r"@(\w+): (\{.*?\})")
+                # if matches exactly
+                if match := toolcall_re.match(msg_response.content):


The regular expression toolcall_re.match(msg_response.content) only matches if the entire content is a tool call. Consider using toolcall_re.search(msg_response.content) to find tool calls within a larger response.

Suggested change

if match := toolcall_re.match(msg_response.content):

if match := toolcall_re.search(msg_response.content):

@ellipsis-dev That's as intended

No changes were made to the regular expression usage in the code, as the reviewer confirmed that the current behavior is intended and correct. If there are any other concerns or areas that need attention, please let me know!

Responding to this comment by @ErikBjare. For more information about Ellipsis, check the documentation.

ellipsis-dev · 2024-10-23T16:26:52Z

gptme/llm.py

+    description = spec.desc
+
+    if spec.functions:
+        logger.warn("functions not supported yet")


The spec2tool function logs a warning for unsupported spec.functions but does not handle them. Consider implementing support or explicitly ignoring them to avoid incomplete tool specifications.

Suggested change

logger.warn("functions not supported yet")

logger.warn("functions not supported yet, ignoring them")

codecov-commenter · 2024-10-23T16:28:47Z

❌ 7 Tests Failed:

Tests completed	Failed	Passed	Skipped
80	7	73	0

View the top 3 failed tests by shortest run time

tests.test_server test_api_conversation_generate

Stack Traces | 0.216s run time

conv = 'test-server-807350', client = &lt;FlaskClient &lt;Flask 'gptme.server.api'&gt;&gt;

    @pytest.mark.slow
    def test_api_conversation_generate(conv: str, client: FlaskClient):
        # Ask the assistant to generate a test response
        response = client.post(
            f"/api/conversations/{conv}",
            json={"role": "user", "content": "hello, just testing"},
        )
        assert response.status_code == 200
    
        response = client.post(
            f"/api/conversations/{conv}/generate",
            json={"model": get_model().model},
        )
&gt;       assert response.status_code == 200
E       assert 500 == 200
E        +  where 500 = &lt;WrapperTestResponse streamed [500 INTERNAL SERVER ERROR]&gt;.status_code

tests/test_server.py:78: AssertionError

tests.test_cli test_subagent

Stack Traces | 2.13s run time

args = ['--name', 'test-8882-test_subagent', 'test the subagent tool by computing `fib(15)` with it, where `fib(1) = 1` and `fib(2) = 1`']
runner = &lt;click.testing.CliRunner object at 0x7fd39bb92ad0&gt;

    @pytest.mark.slow
    @pytest.mark.flaky(retries=2, delay=5)
    def test_subagent(args: list[str], runner: CliRunner):
        # f14: 377
        # f15: 610
        # f16: 987
        args.append(
            "test the subagent tool by computing `fib(15)` with it, where `fib(1) = 1` and `fib(2) = 1`"
        )
        print(f"running: gptme {' '.join(args)}")
        result = runner.invoke(gptme.cli.main, args)
        print(result.output)
    
        # apparently this is not obviously 610
        accepteds = ["377", "610"]
&gt;       assert any([accepted in result.output for accepted in accepteds])
E       assert False
E        +  where False = any([False, False])

.../gptme/tests/test_cli.py:322: AssertionError

tests.test_cli test_generate_primes

Stack Traces | 2.29s run time

args = ['--name', 'test-27503-test_generate_primes', 'compute the first 10 prime numbers']
runner = &lt;click.testing.CliRunner object at 0x7f3a0640fa00&gt;

    @pytest.mark.slow
    def test_generate_primes(args: list[str], runner: CliRunner):
        args.append("compute the first 10 prime numbers")
        result = runner.invoke(gptme.cli.main, args)
        # check that the 9th and 10th prime is present
&gt;       assert "23" in result.output
E       assert '23' in '[17:24:50] Using logdir                                                         \n           .../gptme/logs/test-27503-test_generate_primes          \n           Using workspace at /tmp/tmpq3_fmlrr                                  \nUser: compute the first 10 prime numbers\nAssistant: Here is how we can compute the first 10 prime numbers in Python:\nUser: compute the first 10 prime numbers\nAssistant: \nSkipped 1 hidden system messages, show with --show-hidden\n--- ^^^ past messages ^^^ ---\nUser: compute the first 10 prime numbers\nAssistant: Thinking...\r                                                                                \r'
E        +  where '[17:24:50] Using logdir                                                         \n           .../gptme/logs/test-27503-test_generate_primes          \n           Using workspace at /tmp/tmpq3_fmlrr                                  \nUser: compute the first 10 prime numbers\nAssistant: Here is how we can compute the first 10 prime numbers in Python:\nUser: compute the first 10 prime numbers\nAssistant: \nSkipped 1 hidden system messages, show with --show-hidden\n--- ^^^ past messages ^^^ ---\nUser: compute the first 10 prime numbers\nAssistant: Thinking...\r                                                                                \r' = &lt;Result BadRequestError("Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'messages: text content blocks must be non-empty'}}")&gt;.output

.../gptme/tests/test_cli.py:252: AssertionError

To view individual test run time comparison to the main branch, go to the Test Analytics Dashboard

ellipsis-dev

👍 Looks good to me! Incremental review on 0598b0d in 19 seconds

More details

Looked at 83 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 3 drafted comments based on config settings.

1. gptme/chat.py:215

Draft comment:
The regex pattern for tool calls is duplicated in chat.py and base.py. Consider centralizing this pattern to avoid inconsistencies and improve maintainability.
Reason this comment was not posted:
Confidence changes required: 50%
The regex pattern for tool calls is duplicated in both chat.py and base.py. This should be centralized to avoid inconsistencies and improve maintainability.

2. gptme/llm.py:50

Draft comment:
The parameters field in spec2tool is set to None. Ensure this is properly set to reflect the tool's parameters.
Reason this comment was not posted:
Confidence changes required: 50%
The spec2tool function in llm.py has a placeholder for setting parameters, which is currently set to None. This should be addressed to ensure proper functionality.

3. gptme/tools/base.py:24

Draft comment:
The regex pattern for tool calls is duplicated in chat.py and base.py. Consider centralizing this pattern to avoid inconsistencies and improve maintainability.
Reason this comment was not posted:
Confidence changes required: 50%
The toolcall_re regex pattern is duplicated in both chat.py and base.py. This should be centralized to avoid inconsistencies and improve maintainability.

Workflow ID: wflow_6Qy6YUKNRCML0QLB

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 496bbfd in 18 seconds

More details

Looked at 22 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. README.md:318

Draft comment:
Duplicate link reference for [docs-server]. Consider removing one of them to avoid confusion.
Reason this comment was not posted:
Confidence changes required: 50%
The link reference for [docs-server] is duplicated, which can cause confusion and maintenance issues.

Workflow ID: wflow_xXtOlQyIAZwosRkj

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 4b13443 in 40 seconds

More details

Looked at 85 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 4 drafted comments based on config settings.

1. gptme/llm.py:76

Draft comment:
Ensure _client_to_provider() is defined or imported, as it is used multiple times but not shown in this snippet.
Reason this comment was not posted:
Comment did not seem useful.

2. gptme/llm.py:54

Draft comment:
Consider improving the construction of param_schema with more detailed descriptions for arguments, as noted in the TODO comment.
Reason this comment was not posted:
Confidence changes required: 50%
The spec2tool function has a TODO comment about constructing the parameter schema intelligently. This is a potential area for improvement, but not a critical issue for functionality.

3. gptme/llm.py:75

Draft comment:
Clarify whether input_schema and parameters are equivalent, as noted in the TODO comment, to ensure correct schema usage.
Reason this comment was not posted:
Comment did not seem useful.

4. gptme/llm.py:94

Draft comment:
Consider providing more details in the ValueError message to indicate which provider was unsupported.
Reason this comment was not posted:
Confidence changes required: 20%
The spec2tool function raises a ValueError if the provider doesn't support the tools API. This is a good practice to handle unsupported cases, but the error message could be more informative.

Workflow ID: wflow_UBXx4A8I6MylIv9l

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 28fa60a in 41 seconds

More details

Looked at 85 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 4 drafted comments based on config settings.

1. gptme/llm.py:76

Draft comment:
Ensure _client_to_provider() is defined or imported, as it is used multiple times but not defined in this snippet.
Reason this comment was not posted:
Comment did not seem useful.

2. gptme/llm.py:54

Draft comment:
Address the TODO comment regarding constructing the parameter schema intelligently. This is crucial for ensuring the function's robustness and correctness.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment is pointing out a TODO that is already in the code, which the author is likely aware of. The comment does not provide any new information or actionable steps beyond what the TODO already indicates. It seems to be reiterating the importance of the TODO rather than suggesting a specific code change.
The comment might be emphasizing the importance of the TODO, which could be useful if the author overlooked it. However, it doesn't add any new information or actionable advice beyond the existing TODO.
The presence of the TODO already indicates that the author is aware of the need for improvement, making the comment redundant. The comment does not provide additional actionable insights.
The comment should be deleted as it does not provide new information or actionable steps beyond the existing TODO. It is redundant and not necessary for a code change.

3. gptme/llm.py:75

Draft comment:
Clarify whether input_schema and parameters are the same for different providers to ensure correct implementation.
Reason this comment was not posted:
Comment did not seem useful.

4. gptme/llm.py:94

Draft comment:
Include the provider name in the ValueError message for better debugging when a provider doesn't support the tools API.
Reason this comment was not posted:
Confidence changes required: 50%
The spec2tool function raises a ValueError if the provider doesn't support the tools API. This is a good practice to handle unexpected cases, but it might be helpful to include the provider name in the error message for better debugging.

Workflow ID: wflow_wNbR3bXGrQCWH7fW

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 4a11c8b in 22 seconds

More details

Looked at 65 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. gptme/llm_openai.py:124

Draft comment:
The cast to ChatCompletionChunk is unnecessary if the API guarantees the type. Consider removing it to avoid potential runtime errors if the type is incorrect.
Reason this comment was not posted:
Confidence changes required: 50%
The code uses a cast to ensure the type of chunk is ChatCompletionChunk. However, this cast is not necessary if the type is already guaranteed by the API. Additionally, the cast is not checked at runtime, so it could lead to errors if the type is incorrect.

2. gptme/llm_openai.py:140

Draft comment:
The isinstance checks for ChoiceDeltaToolCall and ChoiceDeltaToolCallFunction are redundant if the API guarantees these types. Consider removing them to simplify the code. This applies to line 142 as well.
Reason this comment was not posted:
Confidence changes required: 50%
The code uses isinstance checks for ChoiceDeltaToolCall and ChoiceDeltaToolCallFunction. These checks are redundant if the types are already guaranteed by the API. Removing them can simplify the code.

Workflow ID: wflow_w0dUwI9WI4JYxwPx

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

jrmi · 2024-11-22T17:29:06Z

Hey @ErikBjare, are you planning to continue this soon? If not, would you be okay with me picking it up (on top of your changes obviously)? I believe this MR could significantly improve model performance, especially for smaller ones and prevent some random small bugs. 🙏

ErikBjare · 2024-11-22T18:01:38Z

@jrmi I'm kinda working on other stuff right now. No immediate plans to pick this up.

Feel free to give it a shot!

ErikBjare · 2024-12-08T17:33:40Z

Was merged in #300, with much appreciated help from @jrmi!

ellipsis-dev bot reviewed Oct 23, 2024

View reviewed changes

ellipsis-dev bot reviewed Oct 24, 2024

View reviewed changes

ErikBjare added 3 commits October 24, 2024 16:43

feat: started working on support for OpenAI's tools API option

f84302f

fix: fixed tool calls

6216b3a

fix: basic tools API support for anthropic

28fa60a

ErikBjare force-pushed the dev/tools-api branch from 4b13443 to 28fa60a Compare October 24, 2024 14:43

ellipsis-dev bot reviewed Oct 24, 2024

View reviewed changes

fix: fixed typing in llm_openai.py

4a11c8b

ellipsis-dev bot reviewed Oct 24, 2024

View reviewed changes

ErikBjare changed the title ~~feat: started working on support for OpenAI's tools API option~~ feat: implemented use of OpenAI's tools API Oct 29, 2024

ErikBjare mentioned this pull request Nov 5, 2024

Tool usage research simonw/llm#607

Open

ErikBjare closed this Dec 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implemented use of OpenAI's `tools` API #219

feat: implemented use of OpenAI's `tools` API #219

ErikBjare commented Oct 23, 2024 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

ellipsis-dev bot Oct 23, 2024

ErikBjare Oct 23, 2024

ellipsis-dev bot Oct 23, 2024

ellipsis-dev bot Oct 23, 2024

codecov-commenter commented Oct 23, 2024 •

edited

Loading

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

jrmi commented Nov 22, 2024

ErikBjare commented Nov 22, 2024 •

edited

Loading

ErikBjare commented Dec 8, 2024

	if match := toolcall_re.match(msg_response.content):
	if match := toolcall_re.search(msg_response.content):

	logger.warn("functions not supported yet")
	logger.warn("functions not supported yet, ignoring them")

feat: implemented use of OpenAI's tools API #219

feat: implemented use of OpenAI's tools API #219

Conversation

ErikBjare commented Oct 23, 2024 • edited by ellipsis-dev bot Loading

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot Oct 23, 2024

Choose a reason for hiding this comment

ErikBjare Oct 23, 2024

Choose a reason for hiding this comment

ellipsis-dev bot Oct 23, 2024

Choose a reason for hiding this comment

ellipsis-dev bot Oct 23, 2024

Choose a reason for hiding this comment

codecov-commenter commented Oct 23, 2024 • edited Loading

❌ 7 Tests Failed:

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

jrmi commented Nov 22, 2024

ErikBjare commented Nov 22, 2024 • edited Loading

ErikBjare commented Dec 8, 2024

feat: implemented use of OpenAI's `tools` API #219

feat: implemented use of OpenAI's `tools` API #219

ErikBjare commented Oct 23, 2024 •

edited by ellipsis-dev bot

Loading

codecov-commenter commented Oct 23, 2024 •

edited

Loading

ErikBjare commented Nov 22, 2024 •

edited

Loading