Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More detailed OCR in PDF24 than in NAPS2 with the same Tesseract version #124

Closed
NextTherapist opened this issue Mar 29, 2023 · 3 comments
Closed
Labels

Comments

@NextTherapist
Copy link

Hi,

I found differences between OCR of PDF24 and NAPS2, both newest versions, both containing Tesseract 5.2.0, when doing OCR on the same black/white PDF file with a gridded background like a slight grey (background from a scanned form).

PDF24 detects small or pixelated characters better, and sometimes NAPS2 detects nothing at all instead. Both need the same amount of time.

Something must be different in the Tesseract integration of both softwares, since both say they use Tesseract 5.2.0. NAPS2 (7.0b7) is on "best" quality, PDF24 has no setting for quality.

Perhaps there is something that could be improved in the NAPS implementation of Tesseract?

@cyanfish
Copy link
Owner

I'll need a sample file with the issue to do a comparison.

@cyanfish
Copy link
Owner

cyanfish commented Apr 8, 2023

I did a comparison and couldn't identify a significant quality difference. Feel free to attach a sample file that is significantly better in PDF24 and I'll have a look.

@NextTherapist
Copy link
Author

Sorry, it was a document with data-protected content. I will have to find another example to publish it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants