Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add screenshot tool, integrate with vision #51

Closed
ErikBjare opened this issue Nov 28, 2023 · 1 comment · Fixed by #92
Closed

Add screenshot tool, integrate with vision #51

ErikBjare opened this issue Nov 28, 2023 · 1 comment · Fixed by #92

Comments

@ErikBjare
Copy link
Owner

ErikBjare commented Nov 28, 2023

This could allow for running/testing GUI applications, and more E2E multimodal behavior.

Xvfb in itself might be a bad idea (although good for running in headless/CI), could just take screenshots directly (also more platform independent).

Not sure how to add input, but keyboard-focused input should be possible.

Vision tracking issue: #50

@ErikBjare ErikBjare changed the title Add Xvfb tool that can feed display into vision Add screenshot tool that can feed into vision Aug 13, 2024
@ErikBjare ErikBjare changed the title Add screenshot tool that can feed into vision Add screenshot tool, integrate with vision Aug 13, 2024
@ErikBjare
Copy link
Owner Author

Finished the browser version of this, but yet to automatically include the resulting screenshot into the conversation: #52

@github-project-automation github-project-automation bot moved this from In progress to Done in gptme roadmap Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant