Don't use DPI as a way to refer to word size in documentation #1846

albertoandreottiATgmail · 2018-08-16T19:27:55Z

Hi,

here,
https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality#rescaling

you recommend to use 300 DPI images. That doesn't make any sense, images don't have a DPI until you print them.
You should give your recommendation in terms of minimal number of pixels for the height of a word, for example. I can have an image where letter 'a' is 20 pixels high, or 200 pixels high. Both images will have different results in terms of performance.
As an independent fact, I can indeed print both images with 300dpi.

Am I missing something?

Alberto.

H-Bluhm · 2018-08-17T07:30:00Z

I would assume that dpi and ppi are used interchangeably here.
Since, as you laid out, the technical meaning of dpi does not make a lot of sense in this case, I think ppi is what was meant.

Shreeshrii · 2018-08-17T09:32:02Z

Documentation wiki can be edited by users. Please modify/correct as required.

…

On Fri, Aug 17, 2018 at 1:00 PM, H-Bluhm ***@***.***> wrote: I would assume that dpi and ppi are used interchangeably here. Since, as you laid out, the technical meaning of dpi does not make a lot of sense in this case, I think ppi is what was meant. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1846 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE2_o-EsGgmOu7STbB3cJN_OqHQOW2Doks5uRnEJgaJpZM4WAcaj> .

--

____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

amitdo · 2018-08-17T17:29:38Z

https://github.com/tesseract-ocr/tesseract/wiki/FAQ-Old#is-there-a-minimum-text-size-it-wont-read-screen-text

jbreiden · 2018-09-06T22:57:46Z

That doesn't make any sense, images don't have a DPI until you print them.

JPEG & PNG both support resolution metadata. Please use it.

Charlie313 · 2018-09-06T23:01:02Z

How do I just remove my self completely from all of this

zdenop · 2018-09-28T21:05:38Z

@albertoandreottiATgmail: I do not know what is your aim, but you are taking it from wrong end.
If you are digitizing paper document you will get different image quality (= different OCR quality) if you do scan with 300 dpi or 70 dpi. Regardless your size of letter on the paper is the same. That why there is suggestion about image dpi: you can not change size of printed letters.
Anyway please use tesseract user forum for discussion.

stweil · 2018-09-29T04:35:06Z

@albertoandreottiATgmail and @zdenop, you are talking about different things. Yes, of course it makes a difference whether scanning is done with a high or a low resolution. But that is only a relative value. Scanning a large poster with 70 dpi will give the same picture as scanning a small printout of the poster with 300 dpi. A human won't see any difference when watching the resulting image file on a screen and will be able to read text in both cases. So I'd expect that it also does not make a difference for Tesseract. Currently it does! An image which was converted from 300 dpi to 600 dpi gives a different (typically better) result with Tesseract, although no information was added and the quality of the image won't get better by such a conversion. Other OCR software does not need or use the resolution information from the input image as far as I know.

amitdo · 2018-09-29T06:47:07Z

The explanation the OP expects is already present in another wiki page.
See my above link.

Also see Ray's remark in #756 (comment)

zdenop closed this as completed Sep 28, 2018

amitdo added the image resolution label May 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't use DPI as a way to refer to word size in documentation #1846

Don't use DPI as a way to refer to word size in documentation #1846

albertoandreottiATgmail commented Aug 16, 2018

H-Bluhm commented Aug 17, 2018

Shreeshrii commented Aug 17, 2018 via email

amitdo commented Aug 17, 2018

jbreiden commented Sep 6, 2018

Charlie313 commented Sep 6, 2018

zdenop commented Sep 28, 2018

stweil commented Sep 29, 2018

amitdo commented Sep 29, 2018

Don't use DPI as a way to refer to word size in documentation #1846

Don't use DPI as a way to refer to word size in documentation #1846

Comments

albertoandreottiATgmail commented Aug 16, 2018

H-Bluhm commented Aug 17, 2018

Shreeshrii commented Aug 17, 2018 via email

amitdo commented Aug 17, 2018

jbreiden commented Sep 6, 2018

Charlie313 commented Sep 6, 2018

zdenop commented Sep 28, 2018

stweil commented Sep 29, 2018

amitdo commented Sep 29, 2018