-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compute ctc target failed #2395
Comments
Is this file taken from tessdata_best repo?
Run your command with |
yes, I am using best tess data |
lstmtraining --traineddata data/tamtrain/tamtrain.traineddata --old_traineddata tesseract/tessdata/tam.traineddata --continue_from data/tam/tam.lstm --net_spec '[Lfx256 O1c111]' --model_output data/checkpoints --debug_level -1 --learning_rate 20e-4 --train_listfile data/list.train --eval_listfile data/list.eval --max_iterations 3000 |
How did you create the box files and lstmf files?
Loaded 54/54 pages (1-54) of document data/ground-truth/out8.lstmf
Loaded 20/20 pages (1-20) of document data/ground-truth/tam.Impact_
Condensed.exp0.lstmf
Loaded 8/8 pages (1-8) of document data/ground-truth/out5.lstmf
Loaded 28/28 pages (1-28) of document data/ground-truth/out2.lstmf
Loaded 58/58 pages (1-58) of document data/ground-truth/out6.lstmf
Loaded 57/57 pages (1-57) of document data/ground-truth/out3.lstmf
Loaded 55/55 pages (1-55) of document data/ground-truth/out9.lstmf
Loaded 56/56 pages (1-56) of document data/ground-truth/out4.lstmf
…On Mon, Apr 22, 2019 at 10:23 AM nijanthan0 ***@***.***> wrote:
lstmtraining --traineddata data/tamtrain/tamtrain.traineddata
--old_traineddata tesseract/tessdata/tam.traineddata --continue_from
data/tam/tam.lstm --net_spec '[Lfx256 O1c111]' --model_output
data/checkpoints --debug_level -1 --learning_rate 20e-4 --train_listfile
data/list.train --eval_listfile data/list.eval --max_iterations 3000
Loaded file data/checkpoints_checkpoint, unpacking...
Successfully restored trainer from data/checkpoints_checkpoint
Loaded 54/54 pages (1-54) of document data/ground-truth/out8.lstmf
Loaded 20/20 pages (1-20) of document
data/ground-truth/tam.Impact_Condensed.exp0.lstmf
Loaded 8/8 pages (1-8) of document data/ground-truth/out5.lstmf
Loaded 28/28 pages (1-28) of document data/ground-truth/out2.lstmf
Loaded 58/58 pages (1-58) of document data/ground-truth/out6.lstmf
Loaded 57/57 pages (1-57) of document data/ground-truth/out3.lstmf
Loaded 55/55 pages (1-55) of document data/ground-truth/out9.lstmf
Loaded 56/56 pages (1-56) of document data/ground-truth/out4.lstmf
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Encoding of string failed! Failure bytes: 23 20 2d ffffffe0 ffffffae
ffffffa4 ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffa3 ffffffe0
ffffffaf ffffff88 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff9f
ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffae
ffffffbf ffffffe0 ffffffae ffffffaf ffffffe0 ffffffae ffffffb2 ffffffe0
ffffffae ffffffbf ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffaf ffffff8d 20
ffffffe0 ffffffae ffffff89 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf
ffffff8d ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffae ffffffb5 ffffffe0
ffffffae ffffffbe ffffffe0 ffffffae ffffffb1 ffffffe0 ffffffaf ffffff81 20
ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae
ffffffb0 ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffa4 ffffffe0
ffffffaf ffffff8d ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffae ffffffaa
ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae
ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0
ffffffaf ffffff81 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d
ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf
ffffff81 20 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8a ffffffe0
ffffffae ffffffa4 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffa4 20
ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff95 ffffffe0 ffffffaf
ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffff99 ffffffe0
ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffffb3
ffffffe0 ffffffaf ffffff8d 20 32 36 20 2d 20 ffffffe0 ffffffae ffffffaa
ffffffe0 ffffffae ffffff95 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae
ffffff95 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8d 20 35
Can't encode transcription: 'வயது : 01.01. 2019 ல் # -துணை பட்டியலில்
உள்ளவாறு திருத்தப்பட்டுள்ளது மொத்த பக்கங்கள் 26 - பக்கம் 5' in language ''
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Encoding of string failed! Failure bytes: 23 ffffffc2 ffffffa3 34 30 31 30
20 31 36 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0
ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f
ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae
ffffffa3 ffffffe0 ffffffaf ffffff8d 2e 20 33 36 20 31 30 31 30 20 31 36
Can't encode transcription: '- 7010 16 வீட்டு எண். % #£4010 16 வீட்டு எண்.
36 1010 16' in language ''
Compute CTC targets failed!
Encoding of string failed! Failure bytes: ffffffe0 ffffffaf ffffff8c
ffffffe0 ffffffae ffffffb0 ffffffe0 ffffffae ffffffbf 20 2d
Can't encode transcription: 'பெயர்: கீதா - பெயர். கௌரி -' in language ''
Encoding of string failed! Failure bytes: 23 30 30 34 30 20 31 38 20
ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae
ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0
ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3
ffffffe0 ffffffaf ffffff8d 2e 20 31 39 20 23 31 30 34 30 20 31 38 20
ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae
ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0
ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3
ffffffe0 ffffffaf ffffff8d 2e 20 31 39 20 31 30 34 30 20 31 38
Can't encode transcription: 'வீட்டு எண். 18 #40
<#40> 18 வீட்டு எண். 19
#1040 <#1040> 18 வீட்டு
எண். 19 1040 18' in language ''
Encoding of string failed! Failure bytes: 5c ffffffe0 ffffffaf ffffffa8 20
7c 20 7c 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffae ffffffaf ffffffe0
ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 3a 20 32 32 20 ffffffe0
ffffffae ffffffaa ffffffe0 ffffffae ffffffbe ffffffe0 ffffffae ffffffb2
ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffa9 ffffffe0 ffffffae
ffffffae ffffffe0 ffffffaf ffffff8d 20 3a ffffffe0 ffffffae ffffffaa
ffffffe0 ffffffaf ffffff86 ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf
ffffff8d 20 26 ffffffe0 ffffffae ffffffb5 20 7c 20 7c 20 ffffffe0 ffffffae
ffffffb5 ffffffe0 ffffffae ffffffaf ffffffe0 ffffffae ffffffa4 ffffffe0
ffffffaf ffffff81 3a 20 32 35 20 ffffffe0 ffffffae ffffffaa ffffffe0
ffffffae ffffffbe ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffae ffffffbf
ffffffe0 ffffffae ffffffa9 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf
ffffff8d 20 3a ffffffe0 ffffffae ffffffaa ffffffe0 ffffffaf ffffff86
ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 20 32 2f 32 31 31 25
30 31
Can't encode transcription: 'வயது: 37 பாலினம் :ஆண் ஃலிஸ்\௨ | | வயது: 22
பாலினம் :பெண் &வ | | வயது: 25 பாலினம் :பெண் 2/211%01' in language ''
Encoding of string failed! Failure bytes: 23 30 34 30 20 31 35 20 ffffffe0
ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae ffffff9f
ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf
ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3 ffffffe0
ffffffaf ffffff8d 2e 20 31 37 20 23 30 30 34 30 20 31 36 20 ffffffe0
ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae ffffff9f
ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf
ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3 ffffffe0
ffffffaf ffffff8d 2e 20 31 33 38 20 37 30 34 30 20 31 35
Can't encode transcription: 'வீட்டு எண். 2 £#40
<#40> 15 வீட்டு எண். 17
#40 <#40> 16 வீட்டு எண்.
138 7040 15' in language ''
Encoding of string failed! Failure bytes: 23 36 34 30 20 3d 20 7c 20 7c 20
ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae
ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0
ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3
ffffffe0 ffffffaf ffffff8d 2e 20 31 34 2d 26 20 31 ffffffc2 ffffffa3 31 30
34 30 20 31 32
Can't encode transcription: 'குப்புசாமிநாயக்கர் - £1640 | | | வீட்டு எண்.
14-& #640 <#640> = | |
வீட்டு எண். 14-& 1£1040 12' in language ''
Compute CTC targets failed!
Compute CTC targets failed!
Encoding of string failed! Failure bytes: 23 20 2d ffffffe0 ffffffae
ffffffa4 ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffa3 ffffffe0
ffffffaf ffffff88 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff9f
ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffae
ffffffbf ffffffe0 ffffffae ffffffaf ffffffe0 ffffffae ffffffb2 ffffffe0
ffffffae ffffffbf ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffaf ffffff8d 20
ffffffe0 ffffffae ffffff89 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf
ffffff8d ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffae ffffffb5 ffffffe0
ffffffae ffffffbe ffffffe0 ffffffae ffffffb1 ffffffe0 ffffffaf ffffff81 20
ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae
ffffffb0 ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffa4 ffffffe0
ffffffaf ffffff8d ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffae ffffffaa
ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae
ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0
ffffffaf ffffff81 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d
ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf
ffffff81 20 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8a ffffffe0
ffffffae ffffffa4 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffa4 20
ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff95 ffffffe0 ffffffaf
ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffff99 ffffffe0
ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffffb3
ffffffe0 ffffffaf ffffff8d 20 32 36 20 2d 20 ffffffe0 ffffffae ffffffaa
ffffffe0 ffffffae ffffff95 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae
ffffff95 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8d 20 36
Can't encode transcription: 'வயது : 01.01. 2019 ல் # -துணை பட்டியலில்
உள்ளவாறு திருத்தப்பட்டுள்ளது மொத்த பக்கங்கள் 26 - பக்கம் 6' in language ''
Compute CTC targets failed!
Encoding of string failed! Failure bytes: 23 20 2d ffffffe0 ffffffae
ffffffa4 ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffa3 ffffffe0
ffffffaf ffffff88 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff9f
ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffae
ffffffbf ffffffe0 ffffffae ffffffaf ffffffe0 ffffffae ffffffb2 ffffffe0
ffffffae ffffffbf ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffaf ffffff8d 20
ffffffe0 ffffffae ffffff89 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf
ffffff8d ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffae ffffffb5 ffffffe0
ffffffae ffffffbe ffffffe0 ffffffae ffffffb1 ffffffe0 ffffffaf ffffff81 20
ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae
ffffffb0 ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffa4 ffffffe0
ffffffaf ffffff8d ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffae ffffffaa
ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae
ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0
ffffffaf ffffff81 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d
ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf
ffffff81 20 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8a ffffffe0
ffffffae ffffffa4 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffa4 20
ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff95 ffffffe0 ffffffaf
ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffff99 ffffffe0
ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffffb3
ffffffe0 ffffffaf ffffff8d 20 32 36 20 2d 20 ffffffe0 ffffffae ffffffaa
ffffffe0 ffffffae ffffff95 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae
ffffff95 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8d 20 39
Can't encode transcription: 'வயது : 01.01. 2019 ல் # -துணை பட்டியலில்
உள்ளவாறு திருத்தப்பட்டுள்ளது மொத்த பக்கங்கள் 26 - பக்கம் 9' in language ''
Compute CTC targets failed!
Compute CTC targets failed!
Encoding of string failed! Failure bytes: 23 31 30 34 30 20 31 38 20
ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae
ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0
ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3
ffffffe0 ffffffaf ffffff8d 2e 20 33 34 20 31 30 34 30 20 31 36
Can't encode transcription: 'வீட்டு எண். 34 2040 16 வீட்டு எண். 34 #1040
<#1040> 18 வீட்டு எண். 34
1040 16' in language ''
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
This is the output after putting the debug level
out2.txt
<https://github.com/tesseract-ocr/tesseract/files/3102202/out2.txt>
out3.txt
<https://github.com/tesseract-ocr/tesseract/files/3102203/out3.txt>
out4.txt
<https://github.com/tesseract-ocr/tesseract/files/3102204/out4.txt>
out5.txt
<https://github.com/tesseract-ocr/tesseract/files/3102205/out5.txt>
out6.txt
<https://github.com/tesseract-ocr/tesseract/files/3102206/out6.txt>
out7.txt
<https://github.com/tesseract-ocr/tesseract/files/3102207/out7.txt>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2395 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG37I5R2EXKH757XJ3MES3PRVAGNANCNFSM4HHCRYLA>
.
--
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
|
I created box file using lsmbox comment and lstmf using lstm.train |
What about Loaded 20/20 pages (1-20) of document
data/ground-truth/tam.Impact_Condensed.exp0.lstmf?
Impact_condensed font does not support Tamil?
The problem is related to your input files. Please share training text or
image and box pair.
…On Mon, Apr 22, 2019 at 10:56 AM nijanthan0 ***@***.***> wrote:
I created box file using lsmbox comment and lstmf using lstm.train
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2395 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG37I4GW5VGG45O6POSAZDPRVEAHANCNFSM4HHCRYLA>
.
--
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
|
Mam but for --eval_listfile I don't know what to give as input so i manually created one impact_condensed font file and then stored in eval listfile. this is my files. |
You have a large number of training files, use one of them for eval (eg. ocr2). I am wondering whether I will test further with all the files you sent and get back. |
No mam "Compute CTC targets failed!" is not related to impact_condensed eval file. |
The zip file has the OCRed text for the images. The ground truth needs to be the correct transcription for the images. |
But I am not using text file in the training process. |
Training uses box/tiff pairs for creating the lstmf files. If you give the wrong text for an image then all training will be wrong. Your box files also hold incorrect text only. |
I tested by using the lstmtraining \
|
This zip file has box files for your images in However, some errors maybe because of incorrect layout analysis and more training will not fix those. You need to use some other method, opencv, uzn etc to mark areas and then recognize them separately. |
Anyway, in all my testing, didn't get the error |
Can we directly use the wordstr box file for training? |
The wordstr box file can be used for training AFTER you review and correct the text for each line. Currently it has been generated using the existing Tamil traineddata so it will have all errors that you see in recognition. For training you need to correct that text so that it matches the image. Test with one file, use debug_level -1 to make sure it looks ok. Then apply to all images. |
Mam, thank you for your help.But i have one problem. lstmtraining --traineddata data/tamiltest/tamiltest.traineddata --old_traineddata tesseract/tessdata/tam.traineddata --continue_from data/tam/tam.lstm --perfect_sample_delay 0 --target_error_rate 0.01 --model_output data/checkpoints --debug_level -1 --train_listfile data/list.train --eval_listfile data/list.eval --max_iterations 10000 Finished! Error rate = 98.136 |
How do i reduce the error rate ?? |
If you are running with ` --debug_level -1` you will have details of every
iteration. Usually the error rate will keep going down.
It seems to me that you are training with about 500 lines of text.
Are you getting any errors during training? Run for `--max_iterations 200`
and look at the console log.
…On Thu, Apr 25, 2019 at 12:21 PM nijanthan0 ***@***.***> wrote:
How do i reduce the error rate ??
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2395 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG37I4OGA2THOB7GDW2RW3PSFIIHANCNFSM4HHCRYLA>
.
--
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
|
lstmtraining --traineddata data/tamiltest/tamiltest.traineddata --old_traineddata data/tam/tam.traineddata --continue_from data/tam/tam.lstm --perfect_sample_delay 0 --target_error_rate 0.01 --model_output data/checkpoints --debug_level -1 --train_listfile data/list.train --eval_listfile data/list.eval --max_iterations 200 At iteration 200/200/200, Mean rms=6.777%, delta=86.539%, char train=165.383%, word train=99.594%, skip ratio=0%, New worst char error = 165.383 wrote checkpoint. Finished! Error rate = 100 |
I didn't get any error during training. |
I first extracted data file from image using " Tamil " tessdata and then i corrected the values of the text file. Then using the text file I created box file and tif file with help of text2image. Then I used " tam " tessdata for other training purpose(like unicharset,lstm training). Is this causes of high error rate? |
Your images have English in them. If you want that to be recognized it
needs to be in your unicharset.
The tam.traineddata has a limited unicharset. By using that, a larger
number of characters have to be added.
Try using Tamil.traineddata for further training and see if that is better.
I am not sure why you are not getting debug msgs on screen.
…On Thu, 25 Apr 2019, 13:56 nijanthan0, ***@***.***> wrote:
I first extracted data file from image using " Tamil " tessdata and then i
corrected the values of the text file. Then using the text file I created
box file and tif file with help of text2image. Then I used " tam " tessdata
for other training purpose(like unicharset,lstm training). Is this causes
of high error rate?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2395 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG37IZ3W4U5BUFXYK6AGBDPSFTMHANCNFSM4HHCRYLA>
.
|
Code range changed from 99 to 145!
tam.unicharset is 99, your text has 145 unichars.
On Thu, Apr 25, 2019 at 2:13 PM Shree Devi Kumar <[email protected]>
wrote:
… Your images have English in them. If you want that to be recognized it
needs to be in your unicharset.
The tam.traineddata has a limited unicharset. By using that, a larger
number of characters have to be added.
Try using Tamil.traineddata for further training and see if that is better.
I am not sure why you are not getting debug msgs on screen.
On Thu, 25 Apr 2019, 13:56 nijanthan0, ***@***.***> wrote:
> I first extracted data file from image using " Tamil " tessdata and then
> i corrected the values of the text file. Then using the text file I created
> box file and tif file with help of text2image. Then I used " tam " tessdata
> for other training purpose(like unicharset,lstm training). Is this causes
> of high error rate?
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#2395 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABG37IZ3W4U5BUFXYK6AGBDPSFTMHANCNFSM4HHCRYLA>
> .
>
--
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
|
--debug_interval -1
It is `interval` not `level`.
-1 is minus one
On Thu, Apr 25, 2019 at 2:18 PM Shree Devi Kumar <[email protected]>
wrote:
… >Code range changed from 99 to 145!
tam.unicharset is 99, your text has 145 unichars.
On Thu, Apr 25, 2019 at 2:13 PM Shree Devi Kumar ***@***.***>
wrote:
> Your images have English in them. If you want that to be recognized it
> needs to be in your unicharset.
>
> The tam.traineddata has a limited unicharset. By using that, a larger
> number of characters have to be added.
>
> Try using Tamil.traineddata for further training and see if that is
> better.
>
> I am not sure why you are not getting debug msgs on screen.
>
>
> On Thu, 25 Apr 2019, 13:56 nijanthan0, ***@***.***> wrote:
>
>> I first extracted data file from image using " Tamil " tessdata and then
>> i corrected the values of the text file. Then using the text file I created
>> box file and tif file with help of text2image. Then I used " tam " tessdata
>> for other training purpose(like unicharset,lstm training). Is this causes
>> of high error rate?
>>
>> —
>> You are receiving this because you commented.
>> Reply to this email directly, view it on GitHub
>> <#2395 (comment)>,
>> or mute the thread
>> <https://github.com/notifications/unsubscribe-auth/ABG37IZ3W4U5BUFXYK6AGBDPSFTMHANCNFSM4HHCRYLA>
>> .
>>
>
--
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
--
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
|
yes, Thank You, Now only i changed it ...😊 |
--old_traineddata data/tam/Tamil.traineddata --continue_from
data/tam/tam.lstm
Both need to be in sync.
Tamil.traineddata
Tamil.lstm
|
Sorry I used fast tessdata of " Tamil.trainedata". Now only i am using best tessdata |
lstmtraining --traineddata data/tamiltest/tamiltest.traineddata --old_traineddata tesseract/tessdata/Tamil.traineddata --continue_from data/Tamil/Tamil.lstm --perfect_sample_delay 0 --target_error_rate 0.01 --model_output data/checkpoints --debug_interval -1 --train_listfile data/list.train --max_iterations 200 Finished! Error rate = 100 "In This also 100 error rate" |
try with command similar to what i used - see #2395 (comment) |
Same error |
What does same error mean? 200 iterations was to test what was going wrong. Now you can train for more iterations. For impact style fine tuning try 400-600 iterations. For plus type fine tuning try 3000-3600. |
Is this caused by the parameters of x_size not same in data generation and train? |
Pull request #3251 improves the error message for "Compute CTC target failed" and now shows the |
@Shreeshrii I am always getting this error when I'm trying to train for Arabic. I am adding my own data in the "training_text" file and it consists of a lot of arabic numbers and dates. |
I had a similar issue ("Compute CTC targets failed!") when I generated two lstmf files from a different type of .box files. |
Hi i'm trying lstmtraining for tamil text, i'm facing compute ctc error |
that is, if I want to train the tesseract on text that it cannot see in the image, it will throw this error? |
@karan00713, @kiberchert, recent software versions report the line image which caused the message. I suggest to visually inspect such images whether they are reasonable (not more than a single line, not rotated) and compare whether line image and line transcription match. |
This error occurs on my pc when text2image created empty tif files. The .lstmf files created out of those empty tif files trigger the error. |
Environment
Current Behavior:
Expected Behavior:
Suggested Fix:
surasystem@surasystem:~$ lstmtraining --traineddata data/tamtrain/tamtrain.traineddata --old_traineddata tesseract/tessdata/tam.traineddata --continue_from data/tam/tam.lstm --net_spec '[Lfx256 O1c111]' --model_output data/checkpoints --learning_rate 20e-4 --train_listfile data/list.train --eval_listfile data/list.eval --max_iterations 3000
Loaded file data/tam/tam.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Code range changed from 99 to 104!
Num (Extended) outputs,weights in Series:
1,36,0,1:1, 0
Num (Extended) outputs,weights in Series:
C3,3:9, 0
Ft16:16, 160
Total weights = 160
[C3,3Ft16]:16, 160
Mp3,3:16, 0
Lfys48:48, 12480
Lfx96:96, 55680
Lrx96:96, 74112
Lfx192:192, 221952
Fc104:104, 20072
Total weights = 384456
Previous null char=2 mapped to 103
Continuing from data/tam/tam.lstm
Loaded 54/54 pages (1-54) of document data/ground-truth/out8.lstmf
Loaded 57/57 pages (1-57) of document data/ground-truth/tam.TAMu_Kadambri.exp0.lstmf
Loaded 20/20 pages (1-20) of document data/ground-truth/tam.Impact_Condensed.exp0.lstmf
Loaded 8/8 pages (1-8) of document data/ground-truth/out5.lstmf
Loaded 28/28 pages (1-28) of document data/ground-truth/out2.lstmf
Loaded 57/57 pages (1-57) of document data/ground-truth/out3.lstmf
Loaded 56/56 pages (1-56) of document data/ground-truth/out4.lstmf
Loaded 55/55 pages (1-55) of document data/ground-truth/out9.lstmf
Loaded 58/58 pages (1-58) of document data/ground-truth/out6.lstmf
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
what i want to do to overcome this issue..
The text was updated successfully, but these errors were encountered: