Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement anthropic-style computer tool #225

Merged
merged 16 commits into from
Nov 1, 2024
Merged

Conversation

ErikBjare
Copy link
Owner

@ErikBjare ErikBjare commented Oct 24, 2024

Fixes #216

gptme wrote all of it initially, I just gave it Anthropics reference implementation.

The allowlisted read-commands were super nice, made it look through the code really fast! Really nice work @brayo-pip!

Doesn't fully work yet, but damn close. Hit Anthropic daily rate limit when working on it, so pushing now.

  • get going
  • get it working in Docker like Anthropic
  • get it working locally
  • make sure docs are correct
  • only enable the computer tool by default inside of the computer-use context
  • figure out what is causing the delays
    • I seem to have fixed some of the delay by removing a speed limit argument that I accidentally added? Maybe?
    • Still delays when starting new terminal windows
    • I don't remember seeing these delays on my macOS machine
  • make the UI nice (nice enough, for now)
  • share the original conversation where gptme wrote most of it as an example

Screenshot

image


Important

Introduces a new computer tool for GUI automation using X11, with Docker support and updated documentation.

  • Behavior:
    • Introduces a new computer tool for GUI automation using X11 in gptme/tools/computer.py.
    • Adds routes /computer and /chat in api.py to serve new interfaces.
    • Updates start_x11.sh to initialize Xvfb, tint2, mutter, VNC, and noVNC for GUI automation.
  • Docker:
    • Adds Dockerfile.computer for building a Docker image with X11 support.
    • Updates Makefile with build-docker-computer target.
  • Documentation:
    • Updates docs/examples.rst, docs/server.rst, and docs/tools.rst with instructions for using the new computer tool.
  • Misc:
    • Adds desktop entry files for applications in image/.config/tint2/applications/.
    • Updates .dockerignore to include scripts/start_x11.sh.

This description was created by Ellipsis for fe9a633. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to c110124 in 1 minute and 11 seconds

More details
  • Looked at 1024 lines of code in 17 files
  • Skipped 1 files when reviewing.
  • Skipped posting 4 drafted comments based on config settings.
1. gptme/tools/computer.py:94
  • Draft comment:
    Consider adding exception handling for subprocess.Popen to manage potential errors gracefully.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The comment is relevant to the changes made in the diff, specifically the run_xdotool function. The current implementation does handle errors by checking the return code and raising an exception, but the comment suggests a more graceful approach. This could be considered a code quality improvement, as it might involve more specific exception handling or logging.
    The current error handling might be sufficient for the intended use case, and adding more exception handling could complicate the code unnecessarily. The comment does not specify what 'graceful' handling would entail.
    While the current error handling is functional, the suggestion for more graceful handling could improve the robustness of the code, especially in a production environment where more detailed error information might be useful.
    The comment is about a change made in the diff and suggests a potential code quality improvement. It should be kept as it provides a clear and actionable suggestion.
2. scripts/start_x11.sh:10
  • Draft comment:
    Consider verifying if the Xvfb process is alive in addition to checking the lock file to avoid false positives.
  • Reason this comment was not posted:
    Comment did not seem useful.
3. scripts/start_x11.sh:19
  • Draft comment:
    Consider making the timeout for wait_for_xvfb configurable to allow flexibility in different environments.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The wait_for_xvfb function uses a fixed timeout to wait for Xvfb to start. This could be improved by making the timeout configurable.
4. scripts/start_x11.sh:52
  • Draft comment:
    Ensure xdotool is installed or handle the case where it is missing to prevent script failures.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The start_x11.sh script uses xdotool to check for running processes, which may not be installed in all environments. This could lead to failures if xdotool is missing.

Workflow ID: wflow_IXuPzrPgFdbAI4PE


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

gptme/tools/computer.py Outdated Show resolved Hide resolved
gptme/tools/computer.py Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Oct 24, 2024

Codecov Report

Attention: Patch coverage is 33.62832% with 75 lines in your changes missing coverage. Please review.

Project coverage is 72.67%. Comparing base (a6b41aa) to head (fe9a633).
Report is 1 commits behind head on master.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
gptme/tools/computer.py 29.59% 69 Missing ⚠️
gptme/server/api.py 66.66% 2 Missing ⚠️
gptme/tools/vision.py 0.00% 2 Missing ⚠️
gptme/tools/__init__.py 83.33% 1 Missing ⚠️
gptme/tools/screenshot.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #225      +/-   ##
==========================================
- Coverage   74.16%   72.67%   -1.50%     
==========================================
  Files          59       60       +1     
  Lines        3731     3842     +111     
==========================================
+ Hits         2767     2792      +25     
- Misses        964     1050      +86     
Flag Coverage Δ
anthropic/claude-3-haiku-20240307 71.65% <33.62%> (-1.41%) ⬇️
openai/gpt-4o-mini 71.34% <33.62%> (-1.35%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ellipsis-dev[bot]

This comment was marked as outdated.

@0xbrayo
Copy link
Collaborator

0xbrayo commented Oct 24, 2024

Super excited for this. Especially if it can later be integrated with local models. Could be game-changing.

ellipsis-dev[bot]

This comment was marked as outdated.

ellipsis-dev[bot]

This comment was marked as outdated.

ellipsis-dev[bot]

This comment was marked as outdated.

ellipsis-dev[bot]

This comment was marked as outdated.

ellipsis-dev[bot]

This comment was marked as outdated.

ellipsis-dev[bot]

This comment was marked as outdated.

ellipsis-dev[bot]

This comment was marked as outdated.

ellipsis-dev[bot]

This comment was marked as outdated.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 2653012 in 28 seconds

More details
  • Looked at 19 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. scripts/Dockerfile.computer:61
  • Draft comment:
    The poetry install command is run twice, once on line 58 and again on line 61. This is redundant and can be optimized by combining them into a single command after copying the project files.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The Dockerfile has a redundant poetry install command after copying the entire project. This can be optimized by combining the two poetry install commands into one after copying the project files.

Workflow ID: wflow_AAC9AJUFRdScP8tk


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@ErikBjare ErikBjare changed the title feat: started working on anthropic-style computer tool feat: implement anthropic-style computer tool Oct 29, 2024
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on 0dc3d40 in 50 seconds

More details
  • Looked at 497 lines of code in 13 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_negvnhzCgZJITmER


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

gptme/tools/computer.py Outdated Show resolved Hide resolved
ErikBjare and others added 3 commits November 1, 2024 11:46
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 6b44dd6 in 24 seconds

More details
  • Looked at 497 lines of code in 13 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 drafted comments based on config settings.
1. gptme/tools/computer.py:94
  • Draft comment:
    Using shell=True can be a security risk if the command string is constructed from external input. Consider using a list of arguments instead of a single string for the command.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The use of shell=True in subprocess.Popen can be a security risk if the command string is constructed from external input. In this case, it seems to be used with a fixed command, but it's still a good practice to avoid it when not necessary.
2. gptme/tools/computer.py:98
  • Draft comment:
    Consider handling exceptions more specifically to ensure that subsequent actions are not attempted if a critical command fails.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The run_xdotool function is used multiple times in the computer_action function. If any of these commands fail, the error is caught and printed, but the function continues execution. This might lead to unexpected behavior if subsequent commands depend on the success of previous ones.
3. scripts/Dockerfile.computer:98
  • Draft comment:
    Consider adding a health check for the VNC service on port 6080 to ensure it's running correctly.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The Dockerfile exposes ports 6080 and 8080, but the health check only checks port 8080. It might be beneficial to also check the status of the VNC service on port 6080 to ensure it's running correctly.

Workflow ID: wflow_eKurPjikSF9JTuYp


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

.dockerignore Outdated
@@ -10,6 +10,8 @@ gptme.toml

# Build scripts
scripts
!scripts/start_x11.sh
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
!scripts/start_x11.sh

@ErikBjare
Copy link
Owner Author

I'm getting weird delays, but it works!

image

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 48b994e in 16 seconds

More details
  • Looked at 71 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. scripts/computer_home/entrypoint.sh:8
  • Draft comment:
    Consider making the --tools option configurable instead of hardcoding computer,vision. This will allow more flexibility for different use cases.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The --tools option in entrypoint.sh is hardcoded to computer,vision, which might not be flexible for different use cases. It would be better to allow this to be configurable.
2. gptme/server/cli.py:29
  • Draft comment:
    Consider adding validation for the --tools option to ensure only supported tools are enabled. Provide feedback if an invalid tool is specified.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The --tools option is added to the CLI, but there is no validation or feedback if an invalid tool is specified. This could lead to confusion if a user makes a typo or uses an unsupported tool.

Workflow ID: wflow_h9cTu6UXxhHAXr42


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 73f093f in 27 seconds

More details
  • Looked at 403 lines of code in 8 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 drafted comments based on config settings.
1. gptme/server/static/main.js:227
  • Draft comment:
    The response from await req.json(); is not being used. Consider utilizing the response for further processing or error handling if needed.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    In gptme/server/static/main.js, the sendMessage method has a potential issue where the await req.json(); call is not used. The response is not being utilized, which might be necessary for further processing or error handling.
2. gptme/tools/__init__.py:69
  • Draft comment:
    Raising a ValueError for missing tools in the allowlist is a change from logging a warning. Ensure this change is intentional, as it could lead to crashes if the allowlist contains unavailable tools.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    In gptme/tools/__init__.py, the init_tools function raises a ValueError if a tool in the allowlist is not found. This is a change from the previous behavior of logging a warning. This change might be intentional, but it could also lead to unexpected crashes if the allowlist contains a tool that is not available.
3. gptme/tools/computer.py:190
  • Draft comment:
    The raise ValueError(f"Invalid action: {action}") is outside of any conditional block, which means it will always raise an error if none of the conditions are met. Ensure all valid actions are handled before this line.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    In gptme/tools/computer.py, the computer function has a potential issue where the raise ValueError(f"Invalid action: {action}") is outside of any conditional block. This means it will always raise an error if none of the conditions are met, which might not be the intended behavior if the function is expected to handle all valid actions.
4. gptme/tools/screenshot.py:21
  • Draft comment:
    The --overwrite option used with scrot is not valid and might cause the command to fail. Consider removing or replacing it with a valid option.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    In gptme/tools/screenshot.py, the _screenshot function uses scrot with the --overwrite option, which is not a valid option for scrot. This might cause the command to fail on Linux systems.

Workflow ID: wflow_QCdaJNj8BdOfHcyS


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 136aa2a in 29 seconds

More details
  • Looked at 27 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_ando1II8li6zgoUG


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on f5d9586 in 27 seconds

More details
  • Looked at 27 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. gptme/tools/__init__.py:68
  • Draft comment:
    Move the check for tool.name in tools_default_disabled before the tool.init call to avoid unnecessary initialization of tools that are disabled by default.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.

Workflow ID: wflow_UNcVdyHL7BLxu3VE


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on 0b5e796 in 29 seconds

More details
  • Looked at 12 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_TZJ08dwhSxiL9Jqg


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

.dockerignore Show resolved Hide resolved
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on fe9a633 in 16 seconds

More details
  • Looked at 31 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. docs/server.rst:43
  • Draft comment:
    Ensure the updated Docker run command is correct and tested, as it removes the port 5000 mapping. This could affect access to the basic chat interface.
  • Reason this comment was not posted:
    Comment did not seem useful.

Workflow ID: wflow_bwhVDG0x0ZroBHaj


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@ErikBjare ErikBjare merged commit 175167e into master Nov 1, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Complete "computer use" support
3 participants