Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The 'Tesseract' able to recognize 'Arabic' words but not 'Arabic' numerals from scanned Image using Python #2955

Closed
sawankumar94 opened this issue Apr 23, 2020 · 6 comments

Comments

@sawankumar94
Copy link

Hello All,

I'm using 'tesseract v5.0.0-alpha.20190708' with 'leptonica-1.78.0' on Windows 10 Pro to extract Arabic text with numerals from a scanned Image(attached).

So, after running following Python Code:
text = str(((pytesseract.image_to_string(Image.open(filename),lang='ara'))))

I can see that 'Tesseract' is able to recognize 'Arabic' words but not able to recognize 'Arabic' numerals. I will attach the screen shot of the tesseract output too.

Please help me, what needs to be done such that it recognizes 'Arabic' numerals too.

Please find attached scanned Image here.
Scanned_Image

Please find attached the screenshot of the tesseract output obtained from the above code.
Tesseract_Output

Thank you!

@amitdo
Copy link
Collaborator

amitdo commented Apr 24, 2020

Duplicate of many of other issues.

arabic numerals
arabic numbers

The issue is related to the data that were used for training Arabic. not to the tesseract program/library itself.

See tesseract-ocr/langdata#71, tesseract-ocr/langdata#72

@hadilaff
Copy link

hadilaff commented Jul 3, 2021

Hello All,

I'm using 'tesseract v5.0.0-alpha.20190708' with 'leptonica-1.78.0' on Windows 10 Pro to extract Arabic text with numerals from a scanned Image(attached).

So, after running following Python Code:
text = str(((pytesseract.image_to_string(Image.open(filename),lang='ara'))))

I can see that 'Tesseract' is able to recognize 'Arabic' words but not able to recognize 'Arabic' numerals. I will attach the screen shot of the tesseract output too.

Please help me, what needs to be done such that it recognizes 'Arabic' numerals too.

Please find attached scanned Image here.
Scanned_Image

Please find attached the screenshot of the tesseract output obtained from the above code.
Tesseract_Output

Thank you!

hello,can you tell me how you could read the data in arabic please

@amitdo
Copy link
Collaborator

amitdo commented Jul 4, 2021

@hadilaff, please use our forum for asking questions about Tesseract's usage.

@hadilaff
Copy link

hadilaff commented Jul 4, 2021

@hadilaff, please use our forum for asking questions about Tesseract's usage.

which forum?

@amitdo
Copy link
Collaborator

amitdo commented Jul 4, 2021

@Frescoboy18
Copy link

can you share your project as zip as I am working on the same thing but having several isssues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants