Add screenshot tool, integrate with vision #51

ErikBjare · 2023-11-28T15:32:49Z

This could allow for running/testing GUI applications, and more E2E multimodal behavior.

Xvfb in itself might be a bad idea (although good for running in headless/CI), could just take screenshots directly (also more platform independent).

Not sure how to add input, but keyboard-focused input should be possible.

Vision tracking issue: #50

ErikBjare · 2024-09-06T14:55:25Z

Finished the browser version of this, but yet to automatically include the resulting screenshot into the conversation: #52

ErikBjare mentioned this issue Nov 28, 2023

Add vision support #50

Closed

ErikBjare mentioned this issue Aug 12, 2024

feat: started working on vision #91

Merged

5 tasks

ErikBjare added this to gptme roadmap Aug 13, 2024

ErikBjare changed the title ~~Add Xvfb tool that can feed display into vision~~ Add screenshot tool that can feed into vision Aug 13, 2024

ErikBjare changed the title ~~Add screenshot tool that can feed into vision~~ Add screenshot tool, integrate with vision Aug 13, 2024

ErikBjare mentioned this issue Aug 13, 2024

feat: added screenshot tool #92

Merged

ErikBjare closed this as completed in #92 Oct 2, 2024

github-project-automation bot moved this from In progress to Done in gptme roadmap Oct 2, 2024

Provide feedback