Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vision support #50

Closed
ErikBjare opened this issue Nov 28, 2023 · 3 comments · Fixed by #91
Closed

Add vision support #50

ErikBjare opened this issue Nov 28, 2023 · 3 comments · Fixed by #91

Comments

@ErikBjare
Copy link
Owner

ErikBjare commented Nov 28, 2023

Since the OpenAI API now has vision in beta, and we could use LLaVa locally.

Might be a lot of work, or might be super easy.

Question is, what would it be useful for?

@ErikBjare ErikBjare added the enhancement New feature or request label Nov 29, 2023
@ErikBjare ErikBjare added capabilities and removed enhancement New feature or request labels Jan 20, 2024
@ErikBjare ErikBjare added the tool label Feb 21, 2024
@github-project-automation github-project-automation bot moved this from In progress to Done in gptme roadmap Aug 13, 2024
@ErikBjare ErikBjare reopened this Oct 2, 2024
@ErikBjare
Copy link
Owner Author

Now works after adding the vision tool: 597c66c

Also merged the screenshot tool: #92

@ErikBjare
Copy link
Owner Author

Anthropic just announced their computer use stuff: https://www.anthropic.com/news/3-5-models-and-computer-use

Basically same thing lol, although I should implement clicks and input (with pynput?), and let it run in Docker at 1280x800 resolution.

@ErikBjare
Copy link
Owner Author

ErikBjare commented Oct 29, 2024

I saw that @simonw had implemented support for not only images, but also video and audio in https://github.com/simonw/llm/releases/tag/0.17: https://x.com/simonw/status/1851280825662521370

That would be a good continuation on this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant