-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ImageToText
& AnswerToImage
#2444
Comments
As I was starting to familiarize myself with images/multimodal support, I ran into this issue. Are these features still desirable? |
@anakin87 yes, still something we want to do. We'll keep this issue updated should we change anything. |
As you can see in this Space, Transformers models for image captioning are available nowadays. @ZanSara if you can provide more details about the architecture/design of this node, it shouldn't be too difficult to develop it. Bonus point: there are also several Transformers models for OCR. |
Hello @anakin87! So the idea of these two nodes were fairly basic.
|
Hey @ZanSara! Speaking of the
|
Hello @anakin87! That's a really good question... but maybe we don't have to choose? 😁 Do you think we can make it work for both? I'd imagine it can, but let me know if you face issues or you don't like the idea. If we need to select just one, however, I'd lean towards |
|
Implementation related to @TuanaCelik played with it recently. |
Closing as superseded by the changes introduced in Haystack 2.x |
[Part of #2418]
What
ImageToText
would be a node that takes a list of paths to images and captions them. The captions are then stored as Documents, with the path to the image in their metadata. The captions will be processed as regular documents, so no radical changes are expected in the core of the framework.Why
ImageToText
could be a nice test of how Haystack could take images as input in indexing pipelines, and help opening the path for image support in general.These changes should be added as separate PRs.
The text was updated successfully, but these errors were encountered: