The 'Tesseract' able to recognize 'Arabic' words but not 'Arabic' numerals from scanned Image using Python #2955

sawankumar94 · 2020-04-23T10:14:31Z

Hello All,

I'm using 'tesseract v5.0.0-alpha.20190708' with 'leptonica-1.78.0' on Windows 10 Pro to extract Arabic text with numerals from a scanned Image(attached).

So, after running following Python Code:
text = str(((pytesseract.image_to_string(Image.open(filename),lang='ara'))))

I can see that 'Tesseract' is able to recognize 'Arabic' words but not able to recognize 'Arabic' numerals. I will attach the screen shot of the tesseract output too.

Please help me, what needs to be done such that it recognizes 'Arabic' numerals too.

Please find attached scanned Image here.

Please find attached the screenshot of the tesseract output obtained from the above code.

Thank you!

amitdo · 2020-04-24T00:44:54Z

Duplicate of many of other issues.

arabic numerals
arabic numbers

The issue is related to the data that were used for training Arabic. not to the tesseract program/library itself.

See tesseract-ocr/langdata#71, tesseract-ocr/langdata#72

hadilaff · 2021-07-03T15:59:51Z

Hello All,

I'm using 'tesseract v5.0.0-alpha.20190708' with 'leptonica-1.78.0' on Windows 10 Pro to extract Arabic text with numerals from a scanned Image(attached).

So, after running following Python Code:
text = str(((pytesseract.image_to_string(Image.open(filename),lang='ara'))))

I can see that 'Tesseract' is able to recognize 'Arabic' words but not able to recognize 'Arabic' numerals. I will attach the screen shot of the tesseract output too.

Please help me, what needs to be done such that it recognizes 'Arabic' numerals too.

Please find attached scanned Image here.

Please find attached the screenshot of the tesseract output obtained from the above code.

Thank you!

hello,can you tell me how you could read the data in arabic please

amitdo · 2021-07-04T10:12:37Z

@hadilaff, please use our forum for asking questions about Tesseract's usage.

hadilaff · 2021-07-04T21:50:32Z

@hadilaff, please use our forum for asking questions about Tesseract's usage.

which forum?

amitdo · 2021-07-04T22:17:33Z

https://groups.google.com/g/tesseract-ocr

Frescoboy18 · 2023-01-19T14:00:46Z

can you share your project as zip as I am working on the same thing but having several isssues.

amitdo closed this as completed Apr 24, 2020

amitdo added eastern arabic numerals traineddata labels Mar 18, 2021

florisre mentioned this issue Apr 24, 2023

Issue with Farsi OCR UB-Mannheim/zotero-ocr#49

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The 'Tesseract' able to recognize 'Arabic' words but not 'Arabic' numerals from scanned Image using Python #2955

The 'Tesseract' able to recognize 'Arabic' words but not 'Arabic' numerals from scanned Image using Python #2955

sawankumar94 commented Apr 23, 2020

amitdo commented Apr 24, 2020

hadilaff commented Jul 3, 2021

amitdo commented Jul 4, 2021

hadilaff commented Jul 4, 2021

amitdo commented Jul 4, 2021

Frescoboy18 commented Jan 19, 2023

The 'Tesseract' able to recognize 'Arabic' words but not 'Arabic' numerals from scanned Image using Python #2955

The 'Tesseract' able to recognize 'Arabic' words but not 'Arabic' numerals from scanned Image using Python #2955

Comments

sawankumar94 commented Apr 23, 2020

amitdo commented Apr 24, 2020

hadilaff commented Jul 3, 2021

amitdo commented Jul 4, 2021

hadilaff commented Jul 4, 2021

amitdo commented Jul 4, 2021

Frescoboy18 commented Jan 19, 2023