Skip to content

Commit

Permalink
Support for numerals in Arabic Script - 1
Browse files Browse the repository at this point in the history
  • Loading branch information
Shreeshrii committed Feb 25, 2019
1 parent cd6143c commit e9afcc0
Show file tree
Hide file tree
Showing 9 changed files with 5,391 additions and 3,946 deletions.
26 changes: 26 additions & 0 deletions Arabic-TOC-ara-Amiri.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
الجفا ................................................... ‎٨٧٢‏
‏غرام مُميت ............................................... ‎٨٧٢٣‏
‏الفؤادالكسير ...... ........................................ ‎٨٧٤‏
‏عَقيقَ في عقيق في عقيقي ....................................... ‎٨٧٥‏
‏الباب الحادي عشر: مُتفرزقات .................................... ‎٨٧٧‏
‏الشاي ................................................... ‎٨٧٩‏
‏مَديح الشاي ............................................... ‎٨٨٠٩‏
‏مشك الشاي ............................................... ‎٨٨١‏
‏ليلة الشاي ................................................ ‎٨٨٢‏
‏رجال السر ...... ......................................... ‎٨٨٣‏
‏في فضل الاجتماع ...... ..................................... ‎٨٨٤‏
‏شججة ..... .............................................. ‎٨٨٥‏
‏لله دَؤبني رَوَاحة ............................................ ‎٨٨٦‏
‏خطة عَبْيِية ..... .......................................... ‎٨٨٧‏
‏مزايا الزمان ...... ......................................... ‎٨٨٨‏
‏عَشراء ................................................... ‎٨٨٩‏
‏قطع علاقة في عتاب ......................................... ‎٨٩١‏
‏مُعاتبة ...... ............................................. ‎٨٩٢‏
‏السمكة ...... ............................................ ‎٨٩٤‏
‏نظرة ...... .............................................. ‎٨٩٥‏
‏القطار ...... ............................................. ‎٨٩٦‏
‏المعالي .................................................. ‎٨٩٨٩‏
‏المصادر والمراجع ............................................ ‎٩٠٠‏
‏الفهرس ...... ............................................. ‎٩٠٢‏
‎٩١١‏

21 changes: 21 additions & 0 deletions Arabic-TOC-numbers-ara-Amiri.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
كلمة المترجم

تقديم الكتاب
الباب الأول. التحليل التوافقي 1
1. مقدمة. . . . . .... ...... .. 1
2 البداالأساسي للعد : :: :.. .:.:: 2
1.3 التباديل ......... 4
1.4 التوافيق ...... ...... 6
15 معاملات كثيرات الحدود ...... 11
"1.6 عدد الحلول الصحيحة للمعادلات : : : : : : : : ::: :: :::: 15
ملخص الفصل ...... 19
مسائل . . . . . . .. ..... ...... ..... .. 19
تمارين نظرية .......... 23
اختبارات ذاتية في المسائل والتمارين . . . . . . . . . . . .. . . . 26
الباب الثاني. مسلمات الاحتمالات 29
1 مقتمة. . . ... ..... ......... 29
2.2 فراغ العينة و الحوادث ..... ...... ...... 29
2.3 مسلمات لاحتملات . . . . . .. ... ..... ...... 35
2.4 بعض المبرهنات البسيطة ............ . 38
5 فراغات العينة بتائج متكافة الفرص . . . : : . . :. . .. .. : . 44

Binary file added Arabic-TOC-numbers.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Arabic-TOC.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
70 changes: 21 additions & 49 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,65 +1,37 @@
# tessdata_arabic

## 2019-02-19 Finetuned from `script/Arabic` - ara-1.traineddata
Finetuned traineddata files adding support for
numerals and punctuation in Arabic script

PlusMinus Finetune Trained using fonts
```
'Amiri Bold' \
'Amiri' \
'Arab' \
'Scheherazade Bold' \
'Scheherazade' \
'Traditional Arabic' \
```

Traineddata Info
```
combine_tessdata -d ara-1.traineddata
PlusMinus Finetune Training was done based on
`tessdata_best/script/Arabic.traineddata`
by tesstrain.sh using fonts and training text
for approximately 4000 iterations.

Version string:4.0.0-313-gfc47
0:config:size=405, offset=192
17:lstm:size=7511187, offset=597
18:lstm-punc-dawg:size=98, offset=7511784
19:lstm-word-dawg:size=2018514, offset=7511882
20:lstm-number-dawg:size=3658, offset=9530396
21:lstm-unicharset:size=7794, offset=9534054
22:lstm-recoder:size=1012, offset=9541848
23:version:size=15, offset=9542860
ara-Amiri.traineddata Info
```
combine_tessdata -d ara-Amiri.traineddata
## Finetuned traineddata files for Arabic using Scheherazade font

Test files for https://github.com/tesseract-ocr/tesseract/issues/2132

### finetuned for Impact - ara-Scheherazade_Impact_400.traineddata
```
combine_tessdata -d ara-Scheherazade_Impact_400.traineddata

Version string:4.00.00alpha:ara:synth20170629:[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx512O1c1]
0:config:size=545, offset=192
17:lstm:size=11582395, offset=737
18:lstm-punc-dawg:size=1986, offset=11583132
19:lstm-word-dawg:size=999442, offset=11585118
20:lstm-number-dawg:size=13250, offset=12584560
21:lstm-unicharset:size=5061, offset=12597810
22:lstm-recoder:size=769, offset=12602871
23:version:size=80, offset=12603640
ara-Scheherazade.traineddata Info
```
combine_tessdata -d ara-Scheherazade.traineddata
### Finetuned for PlusMinus - ara-Scheherazade_PlusMinus_4000.traineddata
```
combine_tessdata -d ara-Scheherazade_PlusMinus_4000.traineddata

Version string:4.0.0-118-gd44b5
0:config:size=405, offset=192
17:lstm:size=11619331, offset=597
18:lstm-punc-dawg:size=98, offset=11619928
19:lstm-word-dawg:size=1644290, offset=11620026
20:lstm-number-dawg:size=2898, offset=13264316
21:lstm-unicharset:size=6460, offset=13267214
22:lstm-recoder:size=850, offset=13273674
23:version:size=16, offset=13274524
ara-Scheherazade.traineddata Info

Fonts used for plus-minus training

```
'Amiri' \
'Sakkal Majalla' \
'Scheherazade' \
'Traditional Arabic' \
```
```
```
Loading

0 comments on commit e9afcc0

Please sign in to comment.