-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
English traineddata file does not contain the '±' character? #48
Comments
The best/fast models were uploaded 5 years ago. AFAIK, no one is working on updating them. |
Thanks for the information and the fast reply. Would you know of any fix I could have access to OCR this character? Many thanks ahead of time ^^ |
The official |
Thanks a lot. I will try this and let you know here if it does indeed work for us going forward. |
After further testing, it would appear both lat.traineddata (https://tesseract-ocr.github.io/tessdoc/Data-Files) and your own model are struggling to get this char in my example. Many thanks! |
|
Thanks for the link. I have tried this on my end with the Latin.traineddata model but I'm still not having much luck with the test file and internal files on my end for getting this character. |
English traineddata file does not contain the '±' character?
Environment
Tesseract Version: 5.00 Downloaded from: https://github.com/UB-Mannheim/tesseract/wiki
Platform: Windows 10 64bit
I am trying to OCR using the English dictionary file found:
https://tesseract-ocr.github.io/tessdoc/Data-Files
I notice the character is not included here either:
https://github.com/tesseract-ocr/langdata_lstm/blob/main/eng/eng.unicharset
Are there any plans to add it? Are there any language files that contain successfully OCR this character?
Many thanks to whoever can assist here. I am attaching the file I used to test this behavior for this character here: (https://github.com/tesseract-ocr/langdata_lstm/files/9870674/Special.Symbols.pdf)
The text was updated successfully, but these errors were encountered: