More detailed OCR in PDF24 than in NAPS2 with the same Tesseract version #124

NextTherapist · 2023-03-29T07:18:21Z

Hi,

I found differences between OCR of PDF24 and NAPS2, both newest versions, both containing Tesseract 5.2.0, when doing OCR on the same black/white PDF file with a gridded background like a slight grey (background from a scanned form).

PDF24 detects small or pixelated characters better, and sometimes NAPS2 detects nothing at all instead. Both need the same amount of time.

Something must be different in the Tesseract integration of both softwares, since both say they use Tesseract 5.2.0. NAPS2 (7.0b7) is on "best" quality, PDF24 has no setting for quality.

Perhaps there is something that could be improved in the NAPS implementation of Tesseract?

cyanfish · 2023-03-29T16:02:32Z

I'll need a sample file with the issue to do a comparison.

cyanfish · 2023-04-08T23:09:27Z

I did a comparison and couldn't identify a significant quality difference. Feel free to attach a sample file that is significantly better in PDF24 and I'll have a look.

NextTherapist · 2023-05-01T12:20:50Z

Sorry, it was a document with data-protected content. I will have to find another example to publish it.

cyanfish closed this as completed Apr 8, 2023

cyanfish added the support label Apr 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More detailed OCR in PDF24 than in NAPS2 with the same Tesseract version #124

More detailed OCR in PDF24 than in NAPS2 with the same Tesseract version #124

NextTherapist commented Mar 29, 2023

cyanfish commented Mar 29, 2023

cyanfish commented Apr 8, 2023

NextTherapist commented May 1, 2023

More detailed OCR in PDF24 than in NAPS2 with the same Tesseract version #124

More detailed OCR in PDF24 than in NAPS2 with the same Tesseract version #124

Comments

NextTherapist commented Mar 29, 2023

cyanfish commented Mar 29, 2023

cyanfish commented Apr 8, 2023

NextTherapist commented May 1, 2023