[Feature Request] OCR search text in images #296

lethefrost · 2024-07-12T01:21:50Z

It would be especially helpful when you have a lot of screenshots, diagrams, photo of slides, etc., embedded in documents or as stand alone image files. Text in images may contain a large amount of information. However, it's not very easy to retrieve them in the traditional ways of file management. It would be greatly appreciated if you could consider making them searchable.

MohamedBassem · 2024-07-13T12:25:26Z

hmmm, OCR is a cool idea indeed. My only concern is finding a good OCR tool that would work with different languages.

lethefrost · 2024-07-13T17:10:08Z

hmmm, OCR is a cool idea indeed. My only concern is finding a good OCR tool that would work with different languages.

This might be helpful - I am thinking probably we can let each user configure a list of possible languages that would occur in their hoard - which usually are the languages they know, so the list wouldn't be too long (for most people it might be 1-3?). It seems that Tesseract.js supports recognizing multiple languages at the same time when you concatenate the lang codes with +?

MohamedBassem · 2024-07-27T21:27:36Z

tesseract.js looks cool indeed. We can probably add it to the roadmap at some point

akshara-tg · 2024-09-01T05:13:46Z

Without OCR (which allows for searching text within images), the hoarding images become somewhat pointless.

Arcturuss · 2024-10-13T11:39:04Z

+1 for OCR in images.
Personally I wanted to make a "meme catalog" in Hoarder. few thoughts about that:

in addition to OCR, semantic search is needed. similar to suggested in [Feature request] Selfhosted semantic search #441 but for images, like Immich does
maybe option to enable OCR separately for hoarded single images only and not for the images from webpages

MohamedBassem · 2024-10-13T11:40:37Z

@Arcturuss OCR for uploaded images is something on our roadmap and I'm definitely planning to do it pretty soon.

MohamedBassem · 2024-10-21T10:51:33Z

OCR is now implemented and will be available in the next release.

lethefrost · 2024-10-21T19:33:12Z

OCR is now implemented and will be available in the next release.

Thank you! It's very great to hear that! Appreciate it a lot.

drycounty · 2024-11-16T02:24:48Z

Can you tell me how this is implemented? Do I need to specify any of the ENV variables for it to work? Can't seem to get it to work from photos of pages of text.

MohamedBassem · 2024-11-17T02:03:03Z

@drycounty it's enabled by default. Currently, we don't expose the extracted text, but we only index it for search. Try searching for the content of the page and see if it'll showup.

MohamedBassem added the feature request New feature or request label Jul 13, 2024

MohamedBassem added this to Hoarder's Roadmap Oct 5, 2024

MohamedBassem closed this as completed in 019b5d2 Oct 20, 2024

github-project-automation bot moved this from Backlog to Done in Hoarder's Roadmap Oct 20, 2024

kamtschatka pushed a commit to kamtschatka/hoarder-app that referenced this issue Nov 2, 2024

feature: Add OCR support for images. Fixes hoarder-app#296

5d285e2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] OCR search text in images #296

[Feature Request] OCR search text in images #296

lethefrost commented Jul 12, 2024

MohamedBassem commented Jul 13, 2024

lethefrost commented Jul 13, 2024

MohamedBassem commented Jul 27, 2024

akshara-tg commented Sep 1, 2024

Arcturuss commented Oct 13, 2024

MohamedBassem commented Oct 13, 2024

MohamedBassem commented Oct 21, 2024

lethefrost commented Oct 21, 2024

drycounty commented Nov 16, 2024

MohamedBassem commented Nov 17, 2024

[Feature Request] OCR search text in images #296

[Feature Request] OCR search text in images #296

Comments

lethefrost commented Jul 12, 2024

MohamedBassem commented Jul 13, 2024

lethefrost commented Jul 13, 2024

MohamedBassem commented Jul 27, 2024

akshara-tg commented Sep 1, 2024

Arcturuss commented Oct 13, 2024

MohamedBassem commented Oct 13, 2024

MohamedBassem commented Oct 21, 2024

lethefrost commented Oct 21, 2024

drycounty commented Nov 16, 2024

MohamedBassem commented Nov 17, 2024