Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vision models support (WIP) #457

Merged
merged 12 commits into from
Dec 10, 2023
Merged

Vision models support (WIP) #457

merged 12 commits into from
Dec 10, 2023

Conversation

gilcu3
Copy link
Contributor

@gilcu3 gilcu3 commented Nov 8, 2023

This PR also depends on #453. It adds support for the current vision model from openai. Feel free to try and let me know if anything breaks.

@Alpha162
Copy link

Alpha162 commented Nov 9, 2023

I've been having good success with it so-far, except that unlike the vision experience on chat.openai.com, it doesn't seem to have persistence when sending images. So if I asked it to describe a photo with say a car in it, the response is exactly as expected, then a follow-on question like 'what colour is the car' fails with it not knowing what I'm referencing.

I have the bot in a few group chats, and although the trigger is working so the bot doesn't respond to everything, in order for it to respond to images in the chat I've set the 'IGNORE_GROUP_VISION=false'. It still honours the trigger for standard text queries, but it responds to every image sent in the chat without a trigger.

Amazing work getting it to this state so quickly, thank you :)

@gilcu3
Copy link
Contributor Author

gilcu3 commented Nov 9, 2023

@Alpha162 could you try again? I tried fixing both issues with the previous commits. Thanks for reporting

@Alpha162
Copy link

@Alpha162 could you try again? I tried fixing both issues with the previous commits. Thanks for reporting

Knocked both issues out of the park, no issues 👍

@rokipet
Copy link

rokipet commented Nov 12, 2023

im getting this error how to fix it ? 2023-11-12 18:02:12,913 - root - ERROR - OpenAIHelper.interpret_image() got multiple values for argument 'prompt'
Traceback (most recent call last):
File "C:\Users\Administrator\Downloads\Bot Updated\chatgpt-telegram-bot-086f8447376b3faa27631bfe13b654fd54223757\bot\telegram_bot.py", line 514, in _execute
interpretation, tokens = await self.openai.interpret_image(chat_id, temp_file_png, prompt=prompt)

@gilcu3
Copy link
Contributor Author

gilcu3 commented Nov 12, 2023

im getting this error how to fix it ? 2023-11-12 18:02:12,913 - root - ERROR - OpenAIHelper.interpret_image() got multiple values for argument 'prompt'
Traceback (most recent call last):
File "C:\Users\Administrator\Downloads\Bot Updated\chatgpt-telegram-bot-086f8447376b3faa27631bfe13b654fd54223757\bot\telegram_bot.py", line 514, in _execute
interpretation, tokens = await self.openai.interpret_image(chat_id, temp_file_png, prompt=prompt)

The only way I can think of this error could happen is if the code calling that function has been changed somehow. How did you get there? Are you using an unmodified version of this branch?

@rokipet
Copy link

rokipet commented Nov 13, 2023

i tried to combine both codes tss and vision and is working only error when i upload a image is that one

@gilcu3
Copy link
Contributor Author

gilcu3 commented Nov 13, 2023

i tried to combine both codes tss and vision and is working only error when i upload a image is that one

If that's the case, try using the develop branch in my fork, it has everything integrated and works for me.

@rokipet
Copy link

rokipet commented Nov 14, 2023

Would you be able to add this ?

https://platform.openai.com/docs/assistants/how-it-works
And add / all the kind of model right away ?

@gilcu3
Copy link
Contributor Author

gilcu3 commented Nov 14, 2023

Would you be able to add this ?

https://platform.openai.com/docs/assistants/how-it-works

Adding the assistants api is certainly a good feature. Still, that would be for another PR. For now I am waiting for @n3d1117 to handle all new PRs first.

And add / all the kind of model right away ?

What do you mean by that?

@SkySlider
Copy link

SkySlider commented Nov 14, 2023

I have an issue when providing custom instruction with the image (in one message)

 - root - ERROR - Can't parse entities: can't find end of the entity starting at byte offset 1045
Traceback (most recent call last):
  File "/root/chatgpt-telegram-bot/bot/telegram_bot.py", line 532, in _execute
    await update.effective_message.reply_text(
  File "/root/chatgpt-telegram-bot/venv/lib/python3.11/site-packages/telegram/_message.py", line 1074, in reply_text
    return await self.get_bot().send_message(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/chatgpt-telegram-bot/venv/lib/python3.11/site-packages/telegram/ext/_extbot.py", line 2633, in send_messag
e
    return await super().send_message(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/chatgpt-telegram-bot/venv/lib/python3.11/site-packages/telegram/_bot.py", line 381, in decorator
    result = await func(self, *args, **kwargs)  # skipcq: PYL-E1102
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/chatgpt-telegram-bot/venv/lib/python3.11/site-packages/telegram/_bot.py", line 807, in send_message
    return await self._send_message(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/chatgpt-telegram-bot/venv/lib/python3.11/site-packages/telegram/ext/_extbot.py", line 507, in _send_message
    result = await super()._send_message(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/chatgpt-telegram-bot/venv/lib/python3.11/site-packages/telegram/_bot.py", line 559, in _send_message
    result = await self._post(
             ^^^^^^^^^^^^^^^^^
  File "/root/chatgpt-telegram-bot/venv/lib/python3.11/site-packages/telegram/_bot.py", line 469, in _post
    return await self._do_post(
           ^^^^^^^^^^^^^^^^^^^^
  File "/root/chatgpt-telegram-bot/venv/lib/python3.11/site-packages/telegram/ext/_extbot.py", line 325, in _do_post
    return await super()._do_post(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/chatgpt-telegram-bot/venv/lib/python3.11/site-packages/telegram/_bot.py", line 497, in _do_post
    return await request.post(
           ^^^^^^^^^^^^^^^^^^^
  File "/root/chatgpt-telegram-bot/venv/lib/python3.11/site-packages/telegram/request/_baserequest.py", line 168, in post
    result = await self._request_wrapper(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/chatgpt-telegram-bot/venv/lib/python3.11/site-packages/telegram/request/_baserequest.py", line 328, in _request_wrapper
    raise BadRequest(message)
telegram.error.BadRequest: Can't parse entities: can't find end of the entity starting at byte offset 1045

There is no issues when sending an image without prompt/description. What I was trying to do is to create HTML code based on the layout pictured.

@gilcu3
Copy link
Contributor Author

gilcu3 commented Nov 14, 2023

There is no issues when sending an image without prompt/description. What I was trying to do is to create HTML code based on the layout pictured.

Luckily you mentioned the construction of HTML code, and I was able to reproduce the issue, I guess it is related to functions. I will try and fix it and get back here.

@gilcu3
Copy link
Contributor Author

gilcu3 commented Nov 14, 2023

@SkySlider The error should be fixed now. The culprit was that if the response from chatgpt is bigger than vision_max_tokens then the message is cut, and maybe the Markdown text was left in the middle, in which case telegram fails to parse it. I solved it by following the behavior of the bot elsewhere: if a message fails try to send it again without formatting.

@SkySlider
Copy link

@SkySlider The error should be fixed now. The culprit was that if the response from chatgpt is bigger than vision_max_tokens then the message is cut, and maybe the Markdown text was left in the middle, in which case telegram fails to parse it. I solved it by following the behavior of the bot elsewhere: if a message fails try to send it again without formatting.

All good now, appreciate it!

@n3d1117
Copy link
Owner

n3d1117 commented Nov 16, 2023

This looks great, thanks @gilcu3! I will be testing #453, #456, #457 and #462 as soon as possibile!

@n3d1117
Copy link
Owner

n3d1117 commented Nov 18, 2023

Hi @gilcu3, #453 and #456 have been merged! 🎉
This one and #462 require some conflicts to be resolved in order to align them with the main branch.
I will take a look at them tomorrow (unless you feel like resolving them first). Thanks again!

@gilcu3
Copy link
Contributor Author

gilcu3 commented Nov 18, 2023

Hi @gilcu3, #453 and #456 have been merged! 🎉 This one and #462 require some conflicts to be resolved in order to align them with the main branch. I will take a look at them tomorrow (unless you feel like resolving them first). Thanks again!

Thanks, I already merged the changes here. The only change made is that for interpreting images now it does not take the user configured model, but the only model that can currently can do this.

@n3d1117
Copy link
Owner

n3d1117 commented Nov 19, 2023

Hi @gilcu3, I took some time to test this and really liking it so far, thanks!
One question, could we support the auto option for fidelity image understanding and set it as default? I see you only added low and high, I'm guessing due to difficulties in counting tokens with auto option?

@gilcu3
Copy link
Contributor Author

gilcu3 commented Nov 19, 2023

Hi @gilcu3, I took some time to test this and really liking it so far, thanks! One question, could we support the auto option for fidelity image understanding and set it as default? I see you only added low and high, I'm guessing due to difficulties in counting tokens with auto option?

Hi, I really don't remember seeing that parameter before. I guess if it is mentioned in the response, we could do the token counting easily. I will test and see if that's the case.
PS: Checked, I don't see which detail parameter was used in the response. One thing I did notice though is that the number of tokens is in the response object, so I think we could do something much better. And the same could be done for the rest of the bot. I think this was added recently.

Things that are not yet supported, but could very well be: streaming the response, adding the image itself to the conversation history (this seems to be what OpenAI believes to be appropriate).

@k3it
Copy link
Contributor

k3it commented Nov 19, 2023

fwiw this pull is working beautifully for me. I really like that the image comment is part of the prompt. Good work!

@gilcu3
Copy link
Contributor Author

gilcu3 commented Nov 19, 2023

@n3d1117 after the last two commits, I am no longer doing the token count by myself, therefore the default is now auto.

One thing that we may discuss later is the following: currently the image is not added to the history. This makes it possible use other models that do support functions, and use the vision model just for interpreting a single image. But, then no follow up questions about the images are possible, which probably is a nice feature in the vision model. The other variant is to use the model as specified by the user for everything, and simply let the user know if it tries to do something the current model cannot handle.

@n3d1117
Copy link
Owner

n3d1117 commented Nov 20, 2023

Awesome @gilcu3, thanks!

Re: the image in the history, as far as I understand there are three paths:

  1. Only allow image processing and follow up questions if the vision model is specified in the config (e.g. OPENAI_MODEL=gpt-4-vision-preview). The downside is that this model will also be used for everything else
  2. (current) Keep using the model defined by the user, but allow one-time image processing using vision model. Follow up questions about the image will not work
  3. Once an image has been received, add it to the history and from then on keep using vision model, until conversation expires/resets

What do you think would be best? I'm slightly in favor of either 3 or a configurable option to choose between 2 and 3.
Wondering if @k3it and @AlexHTW have any input on this?

@k3it
Copy link
Contributor

k3it commented Nov 20, 2023

The current setup seems to work quite good, since it preserves the context of the image interpretation. this allows follow ups to be handled by other models and plugins. in some cases i cut and pasted the same image with a different prompt as a comment, if for some reason i wasnt happy with the original interpretation.

a more general approach that could work for the vision and other models:

  • create a chat group with the bot
  • enable telegram group topics
  • keep separate context within each topic (including possibly image history)
  • a command or keyword to generate a response as a new topic. this would create a brand new topic tab and a new context

the topics support would probably require a lot of work to implement though
just my $0.02 :)

@gianlucaalfa
Copy link
Contributor

gianlucaalfa commented Nov 21, 2023

  1. Once an image has been received, add it to the history and from then on keep using vision model, until conversation expires/resets

Hello! What about something like solution 3, but also with a "preference" of the model. Currently only one model support vision, but maybe in the future there will be more. So it is necessary something to set the "preferred vision model" in the .env file.

But I see another issue. "Once and image is received", it switches to the other model. So does it mean that it loose the previous history after the model switch? Or is this solved by "add it to the history" like you said?

Grazie :)

@gilcu3
Copy link
Contributor Author

gilcu3 commented Nov 21, 2023

@k3it the topics support would probably require a lot of work to implement though just my $0.02 :)

Interesting, I had not heard about topic support. But yeah probably it is out of the scope of this PR, still good to keep in mind.

@n3d1117 What do you think would be best? I'm slightly in favor of either 3 or a configurable option to choose between 2 and 3.

I can implement that, I only need the option name and explanation to put on the README. For me the hardest part is how to explain these options to the users, and also which one should be default (I am inclined by the current option clearly :) )

@Jipok
Copy link

Jipok commented Nov 23, 2023

изображение
Do I understand correctly that if a bot responds to a message without a picture, then it does not “see” the previously sent images?

@gilcu3
Copy link
Contributor Author

gilcu3 commented Nov 23, 2023

Do I understand correctly that if a bot responds to a message without a picture, then it does not “see” the previously sent images?

Yes, unless we implement options 1 or 3. The problem is that the image itself is not currently added to the history, as that cannot be used by non-vision models (which are the ones that do support functions).

@n3d1117
Copy link
Owner

n3d1117 commented Nov 24, 2023

I can implement that, I only need the option name and explanation to put on the README. For me the hardest part is how to explain these options to the users, and also which one should be default (I am inclined by the current option clearly :) )

Hi @gilcu3, what about ENABLE_VISION_FOLLOWUP_QUESTIONS to switch between option 2 and 3? My personal opinion is that it should be true by default 😃 but feel free to implement it your way

While we're at it, should we maybe make the vision model configurable, in case OpenAI adds more in the future? i.e. something like VISION_MODEL=gpt-4-vision-preview instead of hardcoding it, as @gianlucaalfa was suggesting.

@n3d1117 n3d1117 mentioned this pull request Nov 24, 2023
@gilcu3
Copy link
Contributor Author

gilcu3 commented Nov 25, 2023

@n3d1117 it was a bit harder than I expected, but I think it is done. One thing though is that we don't really have a good way to do a summary when there is an image in the history, and chatgpt probably is not doing the best job... So we could remove the image just for that case, or leave as it is, hoping it will be possible in the future :) Feel free to test it and let me know if it needs any fix.

@iamjackg
Copy link

iamjackg commented Dec 7, 2023

Anything left to do here, or are we just waiting for @n3d1117 to have a second to review this?

@n3d1117
Copy link
Owner

n3d1117 commented Dec 10, 2023

Looks great to me, thanks again @gilcu3 and sorry for the long wait!

@n3d1117 n3d1117 merged commit 05a7b5b into n3d1117:main Dec 10, 2023
@gilcu3 gilcu3 deleted the vision-support branch December 11, 2023 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants