-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image to text support? #5
Comments
localai supports multimodal chat completions with gpt-4-vision-preview . can i try baibot with gpt-4-vision-preview instead of gpt-4 ?
|
gpt-4-vision-preview does not appear to be supported by baibot -- only gpt-4 for the moment
|
This is a valid feature request. baibot currently ignores all images sent by you. It doesn't support feeding them to a model yet. |
To address your previous comment:
You're pasting an excerpt from the code which defines the default configuration for models created on the The fact that Perhaps specifying a Regardless, baibot cannot send images to the model, so what you're trying to do cannot be done yet. For completeness, it should be noted that for the actual OpenAI API (recommended to be used via the If you try to use it, you get an error:
Here's the relevant part:
Using |
Thanks @spantaleev . In preparation for this new feature request for baibot. I will open an issue with localAI to let them know that gpt-4-vision-preview is deprecated and to instead name it gpt-4o in compliance with OpenAI API compatibility. This should get mapped to the llava-1.6-mistral model that the stock docker cuda12 localAI v2.20.1 image comes pre installed with. References to gpt-4-vision-preview in https://github.com/mudler/LocalAI/blob/master/aio/gpu-8g/vision.yaml and https://github.com/mudler/LocalAI/blob/master/aio/cpu/vision.yaml and https://github.com/mudler/LocalAI/blob/master/aio/intel/vision.yaml need to be changed to gpt-4o as you point out |
I opened this LocalAI issue mudler/LocalAI#3596 |
@spantaleev |
I see text to image as a supported feature. How about image to text. There are quite a few capable multimodal self-host models these days such as moondream2 and minicpm2.6 that are supported in ollama and similar.
Is that functionality implicitly supported!
The text was updated successfully, but these errors were encountered: