-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vision models support (WIP) #457
Conversation
I've been having good success with it so-far, except that unlike the vision experience on chat.openai.com, it doesn't seem to have persistence when sending images. So if I asked it to describe a photo with say a car in it, the response is exactly as expected, then a follow-on question like 'what colour is the car' fails with it not knowing what I'm referencing. I have the bot in a few group chats, and although the trigger is working so the bot doesn't respond to everything, in order for it to respond to images in the chat I've set the 'IGNORE_GROUP_VISION=false'. It still honours the trigger for standard text queries, but it responds to every image sent in the chat without a trigger. Amazing work getting it to this state so quickly, thank you :) |
@Alpha162 could you try again? I tried fixing both issues with the previous commits. Thanks for reporting |
Knocked both issues out of the park, no issues 👍 |
im getting this error how to fix it ? 2023-11-12 18:02:12,913 - root - ERROR - OpenAIHelper.interpret_image() got multiple values for argument 'prompt' |
The only way I can think of this error could happen is if the code calling that function has been changed somehow. How did you get there? Are you using an unmodified version of this branch? |
i tried to combine both codes tss and vision and is working only error when i upload a image is that one |
If that's the case, try using the develop branch in my fork, it has everything integrated and works for me. |
Would you be able to add this ? https://platform.openai.com/docs/assistants/how-it-works |
Adding the assistants api is certainly a good feature. Still, that would be for another PR. For now I am waiting for @n3d1117 to handle all new PRs first.
What do you mean by that? |
I have an issue when providing custom instruction with the image (in one message)
There is no issues when sending an image without prompt/description. What I was trying to do is to create HTML code based on the layout pictured. |
Luckily you mentioned the construction of HTML code, and I was able to reproduce the issue, I guess it is related to functions. I will try and fix it and get back here. |
@SkySlider The error should be fixed now. The culprit was that if the response from chatgpt is bigger than |
All good now, appreciate it! |
Thanks, I already merged the changes here. The only change made is that for interpreting images now it does not take the user configured model, but the only model that can currently can do this. |
Hi @gilcu3, I took some time to test this and really liking it so far, thanks! |
Hi, I really don't remember seeing that parameter before. I guess if it is mentioned in the response, we could do the token counting easily. I will test and see if that's the case. Things that are not yet supported, but could very well be: streaming the response, adding the image itself to the conversation history (this seems to be what OpenAI believes to be appropriate). |
fwiw this pull is working beautifully for me. I really like that the image comment is part of the prompt. Good work! |
@n3d1117 after the last two commits, I am no longer doing the token count by myself, therefore the default is now One thing that we may discuss later is the following: currently the image is not added to the history. This makes it possible use other models that do support functions, and use the vision model just for interpreting a single image. But, then no follow up questions about the images are possible, which probably is a nice feature in the vision model. The other variant is to use the model as specified by the user for everything, and simply let the user know if it tries to do something the current model cannot handle. |
Awesome @gilcu3, thanks! Re: the image in the history, as far as I understand there are three paths:
What do you think would be best? I'm slightly in favor of either 3 or a configurable option to choose between 2 and 3. |
The current setup seems to work quite good, since it preserves the context of the image interpretation. this allows follow ups to be handled by other models and plugins. in some cases i cut and pasted the same image with a different prompt as a comment, if for some reason i wasnt happy with the original interpretation. a more general approach that could work for the vision and other models:
the topics support would probably require a lot of work to implement though |
Hello! What about something like solution 3, but also with a "preference" of the model. Currently only one model support vision, but maybe in the future there will be more. So it is necessary something to set the "preferred vision model" in the .env file. But I see another issue. "Once and image is received", it switches to the other model. So does it mean that it loose the previous history after the model switch? Or is this solved by "add it to the history" like you said? Grazie :) |
Interesting, I had not heard about topic support. But yeah probably it is out of the scope of this PR, still good to keep in mind.
I can implement that, I only need the option name and explanation to put on the README. For me the hardest part is how to explain these options to the users, and also which one should be default (I am inclined by the current option clearly :) ) |
Yes, unless we implement options 1 or 3. The problem is that the image itself is not currently added to the history, as that cannot be used by non-vision models (which are the ones that do support functions). |
Hi @gilcu3, what about While we're at it, should we maybe make the vision model configurable, in case OpenAI adds more in the future? i.e. something like |
@n3d1117 it was a bit harder than I expected, but I think it is done. One thing though is that we don't really have a good way to do a summary when there is an image in the history, and chatgpt probably is not doing the best job... So we could remove the image just for that case, or leave as it is, hoping it will be possible in the future :) Feel free to test it and let me know if it needs any fix. |
Anything left to do here, or are we just waiting for @n3d1117 to have a second to review this? |
Looks great to me, thanks again @gilcu3 and sorry for the long wait! |
This PR also depends on #453. It adds support for the current vision model from openai. Feel free to try and let me know if anything breaks.