Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implemented use of OpenAI's tools API #219

Closed
wants to merge 4 commits into from
Closed

Conversation

ErikBjare
Copy link
Owner

@ErikBjare ErikBjare commented Oct 23, 2024

Made an attempt at #218, was pretty easy to get this minimal thing working.

  • get it working for save/append
  • get it working for patch
  • get it working for ipython
  • should we register ipython-functions too? leave it to the ipython tool
  • fix parameter schemas
  • clean up examples prompting
  • get it working for anthropic
  • test it with openrouter/ollama with Llama 1B/3B to see if behavior is improved

Would improve system prompt adherence, fixing issues like:


Important

Adds support for OpenAI's Tools API, integrating tool usage into chat flow with tool execution and response handling.

  • Behavior:
    • Adds support for OpenAI's Tools API in step() in chat.py and reply() in llm.py.
    • Parses tool call responses in step() using regex to extract tool name and arguments.
    • Integrates tool execution in chat flow, yielding tool execution results.
  • Functions:
    • Adds build_tools_dict() and spec2tool() in llm.py to construct tool specifications.
    • Modifies reply(), _chat_complete(), and _stream() in llm.py to accept and handle tools.
  • Misc:
    • Updates chat() and stream() in llm_openai.py and llm_anthropic.py to pass tools to API calls.
    • Adds tools parameter to chat() and stream() functions in llm_openai.py and llm_anthropic.py.
    • Updates toolcall_re regex in tools/base.py to parse tool calls.

This description was created by Ellipsis for 4a11c8b. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 6bb676b in 40 seconds

More details
  • Looked at 312 lines of code in 4 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 drafted comments based on config settings.
1. gptme/chat.py:206
  • Draft comment:
    The toolsapi variable is hardcoded to True. Consider making this configurable or based on a condition to allow flexibility in enabling/disabling the tools API.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The comment is about a change made in the diff, specifically the introduction of the toolsapi variable set to True. Making it configurable could be a valid suggestion for improving code flexibility. The comment is actionable and clear, suggesting a potential improvement in code quality.
    The comment assumes that configurability is necessary without knowing the broader context of the application. It might be that the tools API is always intended to be enabled, making the suggestion unnecessary.
    Even if the tools API is always intended to be enabled, making it configurable could still be beneficial for future-proofing or testing purposes.
    The comment is valid as it suggests an actionable improvement to make the toolsapi variable configurable, enhancing flexibility.
2. gptme/llm_anthropic.py:37
  • Draft comment:
    The tools parameter in the chat and stream functions is not type-annotated. Consider adding a type annotation for clarity and to ensure correct usage.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The tools parameter in the chat and stream functions is not type-annotated, which can lead to confusion about the expected input type.
3. gptme/llm_openai.py:73
  • Draft comment:
    The tools parameter in the chat and stream functions is not type-annotated. Consider adding a type annotation for clarity and to ensure correct usage.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The tools parameter in the chat and stream functions is not type-annotated, which can lead to confusion about the expected input type.

Workflow ID: wflow_ub07SS78hNJ2a08Z


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

gptme/chat.py Outdated
if toolsapi:
toolcall_re = re.compile(r"@(\w+): (\{.*?\})")
# if matches exactly
if match := toolcall_re.match(msg_response.content):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regular expression toolcall_re.match(msg_response.content) only matches if the entire content is a tool call. Consider using toolcall_re.search(msg_response.content) to find tool calls within a larger response.

Suggested change
if match := toolcall_re.match(msg_response.content):
if match := toolcall_re.search(msg_response.content):

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ellipsis-dev That's as intended

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No changes were made to the regular expression usage in the code, as the reviewer confirmed that the current behavior is intended and correct. If there are any other concerns or areas that need attention, please let me know!


Responding to this comment by @ErikBjare. For more information about Ellipsis, check the documentation.

gptme/llm.py Outdated
description = spec.desc

if spec.functions:
logger.warn("functions not supported yet")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec2tool function logs a warning for unsupported spec.functions but does not handle them. Consider implementing support or explicitly ignoring them to avoid incomplete tool specifications.

Suggested change
logger.warn("functions not supported yet")
logger.warn("functions not supported yet, ignoring them")

@codecov-commenter
Copy link

codecov-commenter commented Oct 23, 2024

❌ 7 Tests Failed:

Tests completed Failed Passed Skipped
80 7 73 0
View the top 3 failed tests by shortest run time
tests.test_server test_api_conversation_generate
Stack Traces | 0.216s run time
conv = 'test-server-807350', client = <FlaskClient <Flask 'gptme.server.api'>>

    @pytest.mark.slow
    def test_api_conversation_generate(conv: str, client: FlaskClient):
        # Ask the assistant to generate a test response
        response = client.post(
            f"/api/conversations/{conv}",
            json={"role": "user", "content": "hello, just testing"},
        )
        assert response.status_code == 200
    
        response = client.post(
            f"/api/conversations/{conv}/generate",
            json={"model": get_model().model},
        )
>       assert response.status_code == 200
E       assert 500 == 200
E        +  where 500 = <WrapperTestResponse streamed [500 INTERNAL SERVER ERROR]>.status_code

tests/test_server.py:78: AssertionError
tests.test_cli test_subagent
Stack Traces | 2.13s run time
args = ['--name', 'test-8882-test_subagent', 'test the subagent tool by computing `fib(15)` with it, where `fib(1) = 1` and `fib(2) = 1`']
runner = <click.testing.CliRunner object at 0x7fd39bb92ad0>

    @pytest.mark.slow
    @pytest.mark.flaky(retries=2, delay=5)
    def test_subagent(args: list[str], runner: CliRunner):
        # f14: 377
        # f15: 610
        # f16: 987
        args.append(
            "test the subagent tool by computing `fib(15)` with it, where `fib(1) = 1` and `fib(2) = 1`"
        )
        print(f"running: gptme {' '.join(args)}")
        result = runner.invoke(gptme.cli.main, args)
        print(result.output)
    
        # apparently this is not obviously 610
        accepteds = ["377", "610"]
>       assert any([accepted in result.output for accepted in accepteds])
E       assert False
E        +  where False = any([False, False])

.../gptme/tests/test_cli.py:322: AssertionError
tests.test_cli test_generate_primes
Stack Traces | 2.29s run time
args = ['--name', 'test-27503-test_generate_primes', 'compute the first 10 prime numbers']
runner = <click.testing.CliRunner object at 0x7f3a0640fa00>

    @pytest.mark.slow
    def test_generate_primes(args: list[str], runner: CliRunner):
        args.append("compute the first 10 prime numbers")
        result = runner.invoke(gptme.cli.main, args)
        # check that the 9th and 10th prime is present
>       assert "23" in result.output
E       assert '23' in '[17:24:50] Using logdir                                                         \n           .../gptme/logs/test-27503-test_generate_primes          \n           Using workspace at /tmp/tmpq3_fmlrr                                  \nUser: compute the first 10 prime numbers\nAssistant: Here is how we can compute the first 10 prime numbers in Python:\nUser: compute the first 10 prime numbers\nAssistant: \nSkipped 1 hidden system messages, show with --show-hidden\n--- ^^^ past messages ^^^ ---\nUser: compute the first 10 prime numbers\nAssistant: Thinking...\r                                                                                \r'
E        +  where '[17:24:50] Using logdir                                                         \n           .../gptme/logs/test-27503-test_generate_primes          \n           Using workspace at /tmp/tmpq3_fmlrr                                  \nUser: compute the first 10 prime numbers\nAssistant: Here is how we can compute the first 10 prime numbers in Python:\nUser: compute the first 10 prime numbers\nAssistant: \nSkipped 1 hidden system messages, show with --show-hidden\n--- ^^^ past messages ^^^ ---\nUser: compute the first 10 prime numbers\nAssistant: Thinking...\r                                                                                \r' = <Result BadRequestError("Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'messages: text content blocks must be non-empty'}}")>.output

.../gptme/tests/test_cli.py:252: AssertionError

To view individual test run time comparison to the main branch, go to the Test Analytics Dashboard

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 0598b0d in 19 seconds

More details
  • Looked at 83 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 drafted comments based on config settings.
1. gptme/chat.py:215
  • Draft comment:
    The regex pattern for tool calls is duplicated in chat.py and base.py. Consider centralizing this pattern to avoid inconsistencies and improve maintainability.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The regex pattern for tool calls is duplicated in both chat.py and base.py. This should be centralized to avoid inconsistencies and improve maintainability.
2. gptme/llm.py:50
  • Draft comment:
    The parameters field in spec2tool is set to None. Ensure this is properly set to reflect the tool's parameters.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The spec2tool function in llm.py has a placeholder for setting parameters, which is currently set to None. This should be addressed to ensure proper functionality.
3. gptme/tools/base.py:24
  • Draft comment:
    The regex pattern for tool calls is duplicated in chat.py and base.py. Consider centralizing this pattern to avoid inconsistencies and improve maintainability.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The toolcall_re regex pattern is duplicated in both chat.py and base.py. This should be centralized to avoid inconsistencies and improve maintainability.

Workflow ID: wflow_6Qy6YUKNRCML0QLB


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 496bbfd in 18 seconds

More details
  • Looked at 22 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. README.md:318
  • Draft comment:
    Duplicate link reference for [docs-server]. Consider removing one of them to avoid confusion.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The link reference for [docs-server] is duplicated, which can cause confusion and maintenance issues.

Workflow ID: wflow_xXtOlQyIAZwosRkj


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 4b13443 in 40 seconds

More details
  • Looked at 85 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 drafted comments based on config settings.
1. gptme/llm.py:76
  • Draft comment:
    Ensure _client_to_provider() is defined or imported, as it is used multiple times but not shown in this snippet.
  • Reason this comment was not posted:
    Comment did not seem useful.
2. gptme/llm.py:54
  • Draft comment:
    Consider improving the construction of param_schema with more detailed descriptions for arguments, as noted in the TODO comment.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The spec2tool function has a TODO comment about constructing the parameter schema intelligently. This is a potential area for improvement, but not a critical issue for functionality.
3. gptme/llm.py:75
  • Draft comment:
    Clarify whether input_schema and parameters are equivalent, as noted in the TODO comment, to ensure correct schema usage.
  • Reason this comment was not posted:
    Comment did not seem useful.
4. gptme/llm.py:94
  • Draft comment:
    Consider providing more details in the ValueError message to indicate which provider was unsupported.
  • Reason this comment was not posted:
    Confidence changes required: 20%
    The spec2tool function raises a ValueError if the provider doesn't support the tools API. This is a good practice to handle unsupported cases, but the error message could be more informative.

Workflow ID: wflow_UBXx4A8I6MylIv9l


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 28fa60a in 41 seconds

More details
  • Looked at 85 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 drafted comments based on config settings.
1. gptme/llm.py:76
  • Draft comment:
    Ensure _client_to_provider() is defined or imported, as it is used multiple times but not defined in this snippet.
  • Reason this comment was not posted:
    Comment did not seem useful.
2. gptme/llm.py:54
  • Draft comment:
    Address the TODO comment regarding constructing the parameter schema intelligently. This is crucial for ensuring the function's robustness and correctness.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The comment is pointing out a TODO that is already in the code, which the author is likely aware of. The comment does not provide any new information or actionable steps beyond what the TODO already indicates. It seems to be reiterating the importance of the TODO rather than suggesting a specific code change.
    The comment might be emphasizing the importance of the TODO, which could be useful if the author overlooked it. However, it doesn't add any new information or actionable advice beyond the existing TODO.
    The presence of the TODO already indicates that the author is aware of the need for improvement, making the comment redundant. The comment does not provide additional actionable insights.
    The comment should be deleted as it does not provide new information or actionable steps beyond the existing TODO. It is redundant and not necessary for a code change.
3. gptme/llm.py:75
  • Draft comment:
    Clarify whether input_schema and parameters are the same for different providers to ensure correct implementation.
  • Reason this comment was not posted:
    Comment did not seem useful.
4. gptme/llm.py:94
  • Draft comment:
    Include the provider name in the ValueError message for better debugging when a provider doesn't support the tools API.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The spec2tool function raises a ValueError if the provider doesn't support the tools API. This is a good practice to handle unexpected cases, but it might be helpful to include the provider name in the error message for better debugging.

Workflow ID: wflow_wNbR3bXGrQCWH7fW


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 4a11c8b in 22 seconds

More details
  • Looked at 65 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. gptme/llm_openai.py:124
  • Draft comment:
    The cast to ChatCompletionChunk is unnecessary if the API guarantees the type. Consider removing it to avoid potential runtime errors if the type is incorrect.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The code uses a cast to ensure the type of chunk is ChatCompletionChunk. However, this cast is not necessary if the type is already guaranteed by the API. Additionally, the cast is not checked at runtime, so it could lead to errors if the type is incorrect.
2. gptme/llm_openai.py:140
  • Draft comment:
    The isinstance checks for ChoiceDeltaToolCall and ChoiceDeltaToolCallFunction are redundant if the API guarantees these types. Consider removing them to simplify the code. This applies to line 142 as well.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The code uses isinstance checks for ChoiceDeltaToolCall and ChoiceDeltaToolCallFunction. These checks are redundant if the types are already guaranteed by the API. Removing them can simplify the code.

Workflow ID: wflow_w0dUwI9WI4JYxwPx


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@ErikBjare ErikBjare changed the title feat: started working on support for OpenAI's tools API option feat: implemented use of OpenAI's tools API Oct 29, 2024
@jrmi
Copy link
Contributor

jrmi commented Nov 22, 2024

Hey @ErikBjare, are you planning to continue this soon? If not, would you be okay with me picking it up (on top of your changes obviously)? I believe this MR could significantly improve model performance, especially for smaller ones and prevent some random small bugs. 🙏

@ErikBjare
Copy link
Owner Author

ErikBjare commented Nov 22, 2024

@jrmi I'm kinda working on other stuff right now. No immediate plans to pick this up.

Feel free to give it a shot!

@ErikBjare
Copy link
Owner Author

Was merged in #300, with much appreciated help from @jrmi!

@ErikBjare ErikBjare closed this Dec 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants