-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tesseract 4 --oem 0 baseline error with rotated pages #2086
Comments
Hello, I recently installed Tesseract 4.0 tesseract-ocr-w64-setup-v4.0.0.20181030.exe on my Win7 System. To check the page orientation I used the old OCR method, i. e. --oem 0 since it is much faster than LSTM, and hocr output. With the information of textangle I rotated the tiff files if necessary an than did LSTM-OCR and produced an overlayed PDF. With the new Tesseract version I get always no textangle information if the page is rotated by 180 degree and no text is recognized. The same is unfortunately for LSTM where also no text is recognized. Are there any chages or errors? How to get text orientation if 180 degree rotated? |
Please provide:
|
tesseract.exe "image.tif" "image.tif_ocr" --oem 0 -l deu+eng hocr By the way: using psm option is useless because rotation by eg. 10 degree (from scanning) is recognized as 0 degree. |
What the output in the terminal? Can you provide the image? |
I meanwhile uninstalled the version I used
https://digi.bib.uni-mannheim.de/tesseract/tesseract
-ocr-w64-setup-v4.0.0.20181030.exe and went back tu the last beta
https://digi.bib.uni-mannheim.de/tesseract/tesseract
-ocr-w64-setup-v4.0.0-beta.4.20180912.exe that does the job. Unfortunately
I don't have the output anymore but the hocr files from both tesseract
versions for a 180 degree rotated page (see attachement: the one with
textangle 180 is the beta and the one without is the "stable" release, I
also added the tif file as jpeg because tiff is to large for your server
and was rejected). Is this sufficient? The output on the shell was
inconspicuous.
Am Mi., 28. Nov. 2018 um 19:10 Uhr schrieb Amit D. <[email protected]
…:
What the output in the terminal?
Can you provide the image?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2086 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AjW4yE-JoPElWh5Tg31DDi8UlWXPFlwwks5uztGbgaJpZM4Y4BjN>
.
|
Can you provide image for testing? |
Well you can take any image that is rotated by 180 degree, since it happens for any document with rotated pages. The "wrong" hocr file looks like this
what I expected was:
This seems to be is independent of --oem 0 or --oem 1. New information: This version https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w64-setup-v4.0.0-rc3.20181014.exe is also not able to recognise 180 degree rotated pages. Up to the last beta all works well. |
You need to add |
Before you submit an issue, please review the guidelines for this repository.
Please report an issue only for a BUG, not for asking questions.
Note that it will be much easier for us to fix the issue if a test case that
reproduces the problem is provided. Ideally this test case should not have any
external dependencies. Provide a copy of the image or link to files for the test case.
Please delete this text and fill in the template below.
Environment
Current Behavior:
Expected Behavior:
Suggested Fix:
The text was updated successfully, but these errors were encountered: