Add vision support #50

ErikBjare · 2023-11-28T15:31:48Z

Since the OpenAI API now has vision in beta, and we could use LLaVa locally.

Might be a lot of work, or might be super easy.

Question is, what would it be useful for?

Add screenshot tool, integrate with vision #51: Xvfb to understand display/output and make a E2E desktop agent
Add browser screenshot tool, integrate with vision #52: Screenshot with browser tool
- Can be used to take screenshots of developed webapps for visually-aided autodebugging
Have it review plot outputs for correctness and to inspect results
- Could be useful for data science, but reading a good plain text output might still be superior

ErikBjare · 2024-10-02T17:39:44Z

Now works after adding the vision tool: 597c66c

Also merged the screenshot tool: #92

ErikBjare · 2024-10-22T21:46:22Z

Anthropic just announced their computer use stuff: https://www.anthropic.com/news/3-5-models-and-computer-use

Basically same thing lol, although I should implement clicks and input (with pynput?), and let it run in Docker at 1280x800 resolution.

ErikBjare · 2024-10-29T17:54:42Z

I saw that @simonw had implemented support for not only images, but also video and audio in https://github.com/simonw/llm/releases/tag/0.17: https://x.com/simonw/status/1851280825662521370

That would be a good continuation on this issue

This was referenced Nov 28, 2023

Add browser screenshot tool, integrate with vision #52

Closed

Add screenshot tool, integrate with vision #51

Closed

ErikBjare added the enhancement New feature or request label Nov 29, 2023

ErikBjare added this to gptme roadmap Jan 20, 2024

ErikBjare added capabilities and removed enhancement New feature or request labels Jan 20, 2024

ErikBjare added the tool label Feb 21, 2024

ErikBjare mentioned this issue Aug 12, 2024

feat: started working on vision #91

Merged

5 tasks

ErikBjare closed this as completed in #91 Aug 13, 2024

github-project-automation bot moved this from In progress to Done in gptme roadmap Aug 13, 2024

ErikBjare reopened this Oct 2, 2024

ErikBjare closed this as completed Oct 2, 2024

ErikBjare mentioned this issue Oct 22, 2024

Complete "computer use" support #216

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vision support #50

Add vision support #50

ErikBjare commented Nov 28, 2023 •

edited

Loading

ErikBjare commented Oct 2, 2024

ErikBjare commented Oct 22, 2024

ErikBjare commented Oct 29, 2024 •

edited

Loading

Add vision support #50

Add vision support #50

Comments

ErikBjare commented Nov 28, 2023 • edited Loading

ErikBjare commented Oct 2, 2024

ErikBjare commented Oct 22, 2024

ErikBjare commented Oct 29, 2024 • edited Loading

ErikBjare commented Nov 28, 2023 •

edited

Loading

ErikBjare commented Oct 29, 2024 •

edited

Loading