-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The 'Tesseract' able to recognize 'Arabic' words but not 'Arabic' numerals from scanned Image using Python #2955
Comments
Duplicate of many of other issues. arabic numerals The issue is related to the data that were used for training Arabic. not to the tesseract program/library itself. |
@hadilaff, please use our forum for asking questions about Tesseract's usage. |
which forum? |
can you share your project as zip as I am working on the same thing but having several isssues. |
Hello All,
I'm using 'tesseract v5.0.0-alpha.20190708' with 'leptonica-1.78.0' on Windows 10 Pro to extract Arabic text with numerals from a scanned Image(attached).
So, after running following Python Code:
text = str(((pytesseract.image_to_string(Image.open(filename),lang='ara'))))
I can see that 'Tesseract' is able to recognize 'Arabic' words but not able to recognize 'Arabic' numerals. I will attach the screen shot of the tesseract output too.
Please help me, what needs to be done such that it recognizes 'Arabic' numerals too.
Please find attached scanned Image here.
Please find attached the screenshot of the tesseract output obtained from the above code.
Thank you!
The text was updated successfully, but these errors were encountered: