-
Notifications
You must be signed in to change notification settings - Fork 9.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add langtests for Devanagari and Sanskrit
- Loading branch information
1 parent
e4b9cff
commit be443a5
Showing
18 changed files
with
159 additions
and
81 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
#!/bin/bash | ||
# | ||
mkdir -p ~/lang-files | ||
rm -rf ~/lang-files/san-* | ||
for testset in vedic fontsamples oldstyle shreelipi alphabetsamples | ||
do | ||
cd ~/lang-files | ||
mkdir -p ./san-$testset | ||
cp ~/lang-deva-downloads/imagessan/$testset/*.* ./san-$testset/ | ||
cd ./san-$testset/ | ||
rename s/-gt.txt/.txt/ *.txt | ||
ls -1 *.png >pages | ||
sed -i -e 's/.png//g' pages | ||
done | ||
|
||
mkdir -p ~/lang-stopwords | ||
cd ~/lang-stopwords | ||
cp ~/lang-deva-downloads/imagessan/stopwords.txt ./san.stopwords.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
#!/bin/bash | ||
# run langtests/runlangtests.sh with the root data dir, testname, tessdata-dir, language code and image extension | ||
|
||
cd ~/tesseract | ||
|
||
langtests/runlangtests.sh ~/lang-files 4_fast_Devanagari ../tessdata_fast/script Devanagari png | ||
langtests/runlangtests.sh ~/lang-files 4_best_int_Devanagari ../tessdata/script Devanagari png | ||
langtests/runlangtests.sh ~/lang-files 4_best_Devanagari ../tessdata_best/script Devanagari png | ||
langtests/runlangtests.sh ~/lang-files 4_fast_san ../tessdata_fast san png | ||
langtests/runlangtests.sh ~/lang-files 4_best_int_san ../tessdata san png | ||
langtests/runlangtests.sh ~/lang-files 4_best_san ../tessdata_best san png | ||
|
||
langtests/runlangtests.sh ~/lang-files 4_plus40k_san ../tesstutorial-deva san png | ||
|
||
#/home/ubuntu/tesstutorial-deva/san.traineddata at n iterations | ||
|
||
### It takes a while to run. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_best_Devanagari san-alphabetsamples 2013 56.17% 1323 12.27% 1323 12.27 606.28s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_best_Devanagari san-fontsamples 388 94.82% 87 86.38% 87 86.38 570.17s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_best_Devanagari san-oldstyle 2796 59.93% 523 39.61% 523 39.61 447.73s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_best_Devanagari san-shreelipi 830 94.01% 311 81.40% 311 81.40 1137.51s |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_best_int_Devanagari san-alphabetsamples 2010 56.24% 1321 12.40% 1321 12.40 556.26s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_best_int_Devanagari san-fontsamples 396 94.72% 89 86.07% 89 86.07 524.07s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_best_int_Devanagari san-oldstyle 2812 59.70% 523 39.61% 523 39.61 416.57s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_best_int_Devanagari san-shreelipi 829 94.01% 314 81.22% 314 81.22 1087.02s |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,2 @@ | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_best_int_frk frk-ligatures 244 92.78% 109 79.63% 80 73.15 89.80s | ||
4_best_int_frk frk-ligatures 244 92.78% 109 79.63% 80 73.15 367.73s |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_best_int_san san-alphabetsamples 2342 49.01% 1353 10.28% 1353 10.28 281.60s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_best_int_san san-fontsamples 474 93.68% 126 80.28% 126 80.28 281.05s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_best_int_san san-oldstyle 3121 55.27% 602 30.48% 602 30.48 206.20s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_best_int_san san-shreelipi 1163 91.60% 417 75.06% 417 75.06 606.80s |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_best_san san-alphabetsamples 2335 49.16% 1348 10.61% 1348 10.61 300.24s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_best_san san-fontsamples 473 93.69% 126 80.28% 126 80.28 267.05s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_best_san san-oldstyle 3121 55.27% 598 30.95% 598 30.95 205.28s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_best_san san-shreelipi 1168 91.56% 414 75.24% 414 75.24 610.52s |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_fast_Devanagari san-alphabetsamples 2017 56.09% 1317 12.67% 1317 12.67 400.38s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_fast_Devanagari san-fontsamples 433 94.22% 108 83.10% 108 83.10 287.48s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_fast_Devanagari san-oldstyle 2883 58.68% 543 37.30% 543 37.30 289.85s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_fast_Devanagari san-shreelipi 750 94.58% 279 83.31% 279 83.31 813.19s |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_fast_san san-alphabetsamples 2342 49.01% 1353 10.28% 1353 10.28 276.73s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_fast_san san-fontsamples 474 93.68% 126 80.28% 126 80.28 278.34s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_fast_san san-oldstyle 3121 55.27% 602 30.48% 602 30.48 222.35s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_fast_san san-shreelipi 1163 91.60% 417 75.06% 417 75.06 626.40s |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_plus10k_san san-alphabetsamples 1725 62.44% 1112 26.26% 1112 26.26 160.48s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_plus10k_san san-fontsamples 349 95.34% 73 88.58% 73 88.58 138.09s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_plus10k_san san-oldstyle 2818 59.62% 548 36.72% 548 36.72 120.83s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_plus10k_san san-shreelipi 746 94.61% 279 83.31% 279 83.31 292.70s |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_plus20k_san san-alphabetsamples 1441 68.63% 841 44.23% 841 44.23 156.57s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_plus20k_san san-fontsamples 356 95.25% 75 88.26% 75 88.26 135.13s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_plus20k_san san-oldstyle 2862 58.99% 555 35.91% 555 35.91 118.21s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_plus20k_san san-shreelipi 726 94.76% 267 84.03% 267 84.03 295.68s |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_plus30k_san san-alphabetsamples 1656 63.95% 937 37.86% 937 37.86 615.62s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_plus30k_san san-fontsamples 429 94.28% 89 86.07% 89 86.07 617.42s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_plus30k_san san-oldstyle 2885 58.66% 561 35.22% 561 35.22 432.58s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_plus30k_san san-shreelipi 447 96.77% 123 92.64% 123 92.64 1081.29s |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_plus40k_san san-alphabetsamples 1380 69.95% 775 48.61% 775 48.61 1198.16s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_plus40k_san san-fontsamples 401 94.65% 79 87.64% 79 87.64 1275.08s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_plus40k_san san-oldstyle 2860 59.01% 534 38.34% 534 38.34 977.65s | ||
RELEASE TestSet CharErrors Accuracy WordErrors Accuracy NonStopWErrors Accuracy TimeTaken | ||
4_plus40k_san san-shreelipi 441 96.81% 113 93.24% 113 93.24 2301.53s |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
be443a5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be moved to https://github.com/tesseract-ocr/test or https://github.com/tesseract-ocr/tesseract/tree/master/unittest?
be443a5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are similar to the unlvtests, which are for English.
They don't belong under unittest.
If you are moving unlvtests under the test repo, then you can move these too.