-
Notifications
You must be signed in to change notification settings - Fork 888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Indic numerals and missing punctuation to Arabic #131
Comments
Please see tesseract-ocr/tesseract#2263 (comment) |
Is this fixed? I've tried the latest version and it didn't detect any Indic numerals. |
@wewark you have to use Arabic.traineddata file. It recognizes arabic, English letters and Arabic-Indic and Arabic numbers |
@ShroukMansour I use ara.traindata and texts not accuracy also numbers have no accuracy . Is there a solution for this ? |
Previously: #71 and tesseract-ocr/tessdata_best#11 (also contains a pertinent discussion on how well the different traineddata deal with these characters).
• Indic numerals: (٠ ١ ٢ ٣ ٤ ٥ ٦ ٧ ٨ ٩)
• Punctuation: (
؛
,،
,﴿﴾
)• Also, a ligature very commonly found in Arabic texts: ﷺ
If I can do this myself please simply point me the way.
CC @Shreeshrii
The text was updated successfully, but these errors were encountered: