-
-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vision support #50
Comments
Anthropic just announced their computer use stuff: https://www.anthropic.com/news/3-5-models-and-computer-use Basically same thing lol, although I should implement clicks and input (with pynput?), and let it run in Docker at 1280x800 resolution. |
I saw that @simonw had implemented support for not only images, but also video and audio in https://github.com/simonw/llm/releases/tag/0.17: https://x.com/simonw/status/1851280825662521370 That would be a good continuation on this issue |
Since the OpenAI API now has vision in beta, and we could use LLaVa locally.
Might be a lot of work, or might be super easy.
Question is, what would it be useful for?
The text was updated successfully, but these errors were encountered: