compute ctc target failed #2395

nijanthan0 · 2019-04-19T07:42:07Z

Environment

Tesseract Version: 4.0.1
Commit Number:
Platform: ubuntu 14

Current Behavior:

Expected Behavior:

Suggested Fix:

surasystem@surasystem:~$ lstmtraining --traineddata data/tamtrain/tamtrain.traineddata --old_traineddata tesseract/tessdata/tam.traineddata --continue_from data/tam/tam.lstm --net_spec '[Lfx256 O1c111]' --model_output data/checkpoints --learning_rate 20e-4 --train_listfile data/list.train --eval_listfile data/list.eval --max_iterations 3000
Loaded file data/tam/tam.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Code range changed from 99 to 104!
Num (Extended) outputs,weights in Series:
1,36,0,1:1, 0
Num (Extended) outputs,weights in Series:
C3,3:9, 0
Ft16:16, 160
Total weights = 160
[C3,3Ft16]:16, 160
Mp3,3:16, 0
Lfys48:48, 12480
Lfx96:96, 55680
Lrx96:96, 74112
Lfx192:192, 221952
Fc104:104, 20072
Total weights = 384456
Previous null char=2 mapped to 103
Continuing from data/tam/tam.lstm
Loaded 54/54 pages (1-54) of document data/ground-truth/out8.lstmf
Loaded 57/57 pages (1-57) of document data/ground-truth/tam.TAMu_Kadambri.exp0.lstmf
Loaded 20/20 pages (1-20) of document data/ground-truth/tam.Impact_Condensed.exp0.lstmf
Loaded 8/8 pages (1-8) of document data/ground-truth/out5.lstmf
Loaded 28/28 pages (1-28) of document data/ground-truth/out2.lstmf
Loaded 57/57 pages (1-57) of document data/ground-truth/out3.lstmf
Loaded 56/56 pages (1-56) of document data/ground-truth/out4.lstmf
Loaded 55/55 pages (1-55) of document data/ground-truth/out9.lstmf
Loaded 58/58 pages (1-58) of document data/ground-truth/out6.lstmf
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!

what i want to do to overcome this issue..

Shreeshrii · 2019-04-21T05:31:12Z

--old_traineddata tesseract/tessdata/tam.traineddata

Is this file taken from tessdata_best repo?

lstmtraining --traineddata data/tamtrain/tamtrain.traineddata --old_traineddata tesseract/tessdata/tam.traineddata --continue_from data/tam/tam.lstm --net_spec '[Lfx256 O1c111]' --model_output data/checkpoints --learning_rate 20e-4 --train_listfile data/list.train --eval_listfile data/list.eval --max_iterations 3000

Run your command with --debug_level -1 and share console output and also the training_text used.

nijanthan0 · 2019-04-22T04:50:31Z

yes, I am using best tess data

nijanthan0 · 2019-04-22T04:53:15Z

lstmtraining --traineddata data/tamtrain/tamtrain.traineddata --old_traineddata tesseract/tessdata/tam.traineddata --continue_from data/tam/tam.lstm --net_spec '[Lfx256 O1c111]' --model_output data/checkpoints --debug_level -1 --learning_rate 20e-4 --train_listfile data/list.train --eval_listfile data/list.eval --max_iterations 3000
Loaded file data/checkpoints_checkpoint, unpacking...
Successfully restored trainer from data/checkpoints_checkpoint
Loaded 54/54 pages (1-54) of document data/ground-truth/out8.lstmf
Loaded 20/20 pages (1-20) of document data/ground-truth/tam.Impact_Condensed.exp0.lstmf
Loaded 8/8 pages (1-8) of document data/ground-truth/out5.lstmf
Loaded 28/28 pages (1-28) of document data/ground-truth/out2.lstmf
Loaded 58/58 pages (1-58) of document data/ground-truth/out6.lstmf
Loaded 57/57 pages (1-57) of document data/ground-truth/out3.lstmf
Loaded 55/55 pages (1-55) of document data/ground-truth/out9.lstmf
Loaded 56/56 pages (1-56) of document data/ground-truth/out4.lstmf
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Encoding of string failed! Failure bytes: 23 20 2d ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff88 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffaf ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffaf ffffff8d 20 ffffffe0 ffffffae ffffff89 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffae ffffffbe ffffffe0 ffffffae ffffffb1 ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffb0 ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8a ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffa4 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff95 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffff99 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d 20 32 36 20 2d 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff95 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8d 20 35
Can't encode transcription: 'வயது : 01.01. 2019 ல் # -துணை பட்டியலில் உள்ளவாறு திருத்தப்பட்டுள்ளது மொத்த பக்கங்கள் 26 - பக்கம் 5' in language ''
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Encoding of string failed! Failure bytes: 23 ffffffc2 ffffffa3 34 30 31 30 20 31 36 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 2e 20 33 36 20 31 30 31 30 20 31 36
Can't encode transcription: '- 7010 16 வீட்டு எண். % #£4010 16 வீட்டு எண். 36 1010 16' in language ''
Compute CTC targets failed!
Encoding of string failed! Failure bytes: ffffffe0 ffffffaf ffffff8c ffffffe0 ffffffae ffffffb0 ffffffe0 ffffffae ffffffbf 20 2d
Can't encode transcription: 'பெயர்: கீதா - பெயர். கௌரி -' in language ''
Encoding of string failed! Failure bytes: 23 30 30 34 30 20 31 38 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 2e 20 31 39 20 23 31 30 34 30 20 31 38 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 2e 20 31 39 20 31 30 34 30 20 31 38
Can't encode transcription: 'வீட்டு எண். 18 #40 18 வீட்டு எண். 19 #1040 18 வீட்டு எண். 19 1040 18' in language ''
Encoding of string failed! Failure bytes: 5c ffffffe0 ffffffaf ffffffa8 20 7c 20 7c 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffae ffffffaf ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 3a 20 32 32 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffffbe ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffa9 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8d 20 3a ffffffe0 ffffffae ffffffaa ffffffe0 ffffffaf ffffff86 ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 20 26 ffffffe0 ffffffae ffffffb5 20 7c 20 7c 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffae ffffffaf ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 3a 20 32 35 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffffbe ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffa9 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8d 20 3a ffffffe0 ffffffae ffffffaa ffffffe0 ffffffaf ffffff86 ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 20 32 2f 32 31 31 25 30 31
Can't encode transcription: 'வயது: 37 பாலினம் :ஆண் ஃலிஸ்\௨ | | வயது: 22 பாலினம் :பெண் &வ | | வயது: 25 பாலினம் :பெண் 2/211%01' in language ''
Encoding of string failed! Failure bytes: 23 30 34 30 20 31 35 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 2e 20 31 37 20 23 30 30 34 30 20 31 36 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 2e 20 31 33 38 20 37 30 34 30 20 31 35
Can't encode transcription: 'வீட்டு எண். 2 £#40 15 வீட்டு எண். 17 #40 16 வீட்டு எண். 138 7040 15' in language ''
Encoding of string failed! Failure bytes: 23 36 34 30 20 3d 20 7c 20 7c 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 2e 20 31 34 2d 26 20 31 ffffffc2 ffffffa3 31 30 34 30 20 31 32
Can't encode transcription: 'குப்புசாமிநாயக்கர் - £1640 | | | வீட்டு எண். 14-& #640 = | | வீட்டு எண். 14-& 1£1040 12' in language ''
Compute CTC targets failed!
Compute CTC targets failed!
Encoding of string failed! Failure bytes: 23 20 2d ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff88 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffaf ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffaf ffffff8d 20 ffffffe0 ffffffae ffffff89 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffae ffffffbe ffffffe0 ffffffae ffffffb1 ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffb0 ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8a ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffa4 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff95 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffff99 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d 20 32 36 20 2d 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff95 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8d 20 36
Can't encode transcription: 'வயது : 01.01. 2019 ல் # -துணை பட்டியலில் உள்ளவாறு திருத்தப்பட்டுள்ளது மொத்த பக்கங்கள் 26 - பக்கம் 6' in language ''
Compute CTC targets failed!
Encoding of string failed! Failure bytes: 23 20 2d ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff88 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffaf ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffaf ffffff8d 20 ffffffe0 ffffffae ffffff89 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffae ffffffbe ffffffe0 ffffffae ffffffb1 ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffb0 ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8a ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffa4 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff95 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffff99 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d 20 32 36 20 2d 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff95 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8d 20 39
Can't encode transcription: 'வயது : 01.01. 2019 ல் # -துணை பட்டியலில் உள்ளவாறு திருத்தப்பட்டுள்ளது மொத்த பக்கங்கள் 26 - பக்கம் 9' in language ''
Compute CTC targets failed!
Compute CTC targets failed!
Encoding of string failed! Failure bytes: 23 31 30 34 30 20 31 38 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 2e 20 33 34 20 31 30 34 30 20 31 36
Can't encode transcription: 'வீட்டு எண். 34 2040 16 வீட்டு எண். 34 #1040 18 வீட்டு எண். 34 1040 16' in language ''
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
This is the output after putting the debug level
out2.txt
out3.txt
out4.txt
out5.txt
out6.txt
out7.txt

Shreeshrii · 2019-04-22T05:03:21Z

How did you create the box files and lstmf files? Loaded 54/54 pages (1-54) of document data/ground-truth/out8.lstmf Loaded 20/20 pages (1-20) of document data/ground-truth/tam.Impact_ Condensed.exp0.lstmf Loaded 8/8 pages (1-8) of document data/ground-truth/out5.lstmf Loaded 28/28 pages (1-28) of document data/ground-truth/out2.lstmf Loaded 58/58 pages (1-58) of document data/ground-truth/out6.lstmf Loaded 57/57 pages (1-57) of document data/ground-truth/out3.lstmf Loaded 55/55 pages (1-55) of document data/ground-truth/out9.lstmf Loaded 56/56 pages (1-56) of document data/ground-truth/out4.lstmf

…

On Mon, Apr 22, 2019 at 10:23 AM nijanthan0 ***@***.***> wrote: lstmtraining --traineddata data/tamtrain/tamtrain.traineddata --old_traineddata tesseract/tessdata/tam.traineddata --continue_from data/tam/tam.lstm --net_spec '[Lfx256 O1c111]' --model_output data/checkpoints --debug_level -1 --learning_rate 20e-4 --train_listfile data/list.train --eval_listfile data/list.eval --max_iterations 3000 Loaded file data/checkpoints_checkpoint, unpacking... Successfully restored trainer from data/checkpoints_checkpoint Loaded 54/54 pages (1-54) of document data/ground-truth/out8.lstmf Loaded 20/20 pages (1-20) of document data/ground-truth/tam.Impact_Condensed.exp0.lstmf Loaded 8/8 pages (1-8) of document data/ground-truth/out5.lstmf Loaded 28/28 pages (1-28) of document data/ground-truth/out2.lstmf Loaded 58/58 pages (1-58) of document data/ground-truth/out6.lstmf Loaded 57/57 pages (1-57) of document data/ground-truth/out3.lstmf Loaded 55/55 pages (1-55) of document data/ground-truth/out9.lstmf Loaded 56/56 pages (1-56) of document data/ground-truth/out4.lstmf Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Encoding of string failed! Failure bytes: 23 20 2d ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff88 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffaf ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffaf ffffff8d 20 ffffffe0 ffffffae ffffff89 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffae ffffffbe ffffffe0 ffffffae ffffffb1 ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffb0 ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8a ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffa4 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff95 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffff99 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d 20 32 36 20 2d 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff95 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8d 20 35 Can't encode transcription: 'வயது : 01.01. 2019 ல் # -துணை பட்டியலில் உள்ளவாறு திருத்தப்பட்டுள்ளது மொத்த பக்கங்கள் 26 - பக்கம் 5' in language '' Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Encoding of string failed! Failure bytes: 23 ffffffc2 ffffffa3 34 30 31 30 20 31 36 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 2e 20 33 36 20 31 30 31 30 20 31 36 Can't encode transcription: '- 7010 16 வீட்டு எண். % #£4010 16 வீட்டு எண். 36 1010 16' in language '' Compute CTC targets failed! Encoding of string failed! Failure bytes: ffffffe0 ffffffaf ffffff8c ffffffe0 ffffffae ffffffb0 ffffffe0 ffffffae ffffffbf 20 2d Can't encode transcription: 'பெயர்: கீதா - பெயர். கௌரி -' in language '' Encoding of string failed! Failure bytes: 23 30 30 34 30 20 31 38 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 2e 20 31 39 20 23 31 30 34 30 20 31 38 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 2e 20 31 39 20 31 30 34 30 20 31 38 Can't encode transcription: 'வீட்டு எண். 18 #40 <#40> 18 வீட்டு எண். 19 #1040 <#1040> 18 வீட்டு எண். 19 1040 18' in language '' Encoding of string failed! Failure bytes: 5c ffffffe0 ffffffaf ffffffa8 20 7c 20 7c 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffae ffffffaf ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 3a 20 32 32 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffffbe ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffa9 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8d 20 3a ffffffe0 ffffffae ffffffaa ffffffe0 ffffffaf ffffff86 ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 20 26 ffffffe0 ffffffae ffffffb5 20 7c 20 7c 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffae ffffffaf ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 3a 20 32 35 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffffbe ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffa9 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8d 20 3a ffffffe0 ffffffae ffffffaa ffffffe0 ffffffaf ffffff86 ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 20 32 2f 32 31 31 25 30 31 Can't encode transcription: 'வயது: 37 பாலினம் :ஆண் ஃலிஸ்\௨ | | வயது: 22 பாலினம் :பெண் &வ | | வயது: 25 பாலினம் :பெண் 2/211%01' in language '' Encoding of string failed! Failure bytes: 23 30 34 30 20 31 35 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 2e 20 31 37 20 23 30 30 34 30 20 31 36 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 2e 20 31 33 38 20 37 30 34 30 20 31 35 Can't encode transcription: 'வீட்டு எண். 2 £#40 <#40> 15 வீட்டு எண். 17 #40 <#40> 16 வீட்டு எண். 138 7040 15' in language '' Encoding of string failed! Failure bytes: 23 36 34 30 20 3d 20 7c 20 7c 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 2e 20 31 34 2d 26 20 31 ffffffc2 ffffffa3 31 30 34 30 20 31 32 Can't encode transcription: 'குப்புசாமிநாயக்கர் - £1640 | | | வீட்டு எண். 14-& #640 <#640> = | | வீட்டு எண். 14-& 1£1040 12' in language '' Compute CTC targets failed! Compute CTC targets failed! Encoding of string failed! Failure bytes: 23 20 2d ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff88 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffaf ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffaf ffffff8d 20 ffffffe0 ffffffae ffffff89 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffae ffffffbe ffffffe0 ffffffae ffffffb1 ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffb0 ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8a ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffa4 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff95 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffff99 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d 20 32 36 20 2d 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff95 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8d 20 36 Can't encode transcription: 'வயது : 01.01. 2019 ல் # -துணை பட்டியலில் உள்ளவாறு திருத்தப்பட்டுள்ளது மொத்த பக்கங்கள் 26 - பக்கம் 6' in language '' Compute CTC targets failed! Encoding of string failed! Failure bytes: 23 20 2d ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff88 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffaf ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffaf ffffff8d 20 ffffffe0 ffffffae ffffff89 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffae ffffffbe ffffffe0 ffffffae ffffffb1 ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffb0 ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8a ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffa4 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff95 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffff99 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d 20 32 36 20 2d 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff95 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8d 20 39 Can't encode transcription: 'வயது : 01.01. 2019 ல் # -துணை பட்டியலில் உள்ளவாறு திருத்தப்பட்டுள்ளது மொத்த பக்கங்கள் 26 - பக்கம் 9' in language '' Compute CTC targets failed! Compute CTC targets failed! Encoding of string failed! Failure bytes: 23 31 30 34 30 20 31 38 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 2e 20 33 34 20 31 30 34 30 20 31 36 Can't encode transcription: 'வீட்டு எண். 34 2040 16 வீட்டு எண். 34 #1040 <#1040> 18 வீட்டு எண். 34 1040 16' in language '' Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! Compute CTC targets failed! This is the output after putting the debug level out2.txt <https://github.com/tesseract-ocr/tesseract/files/3102202/out2.txt> out3.txt <https://github.com/tesseract-ocr/tesseract/files/3102203/out3.txt> out4.txt <https://github.com/tesseract-ocr/tesseract/files/3102204/out4.txt> out5.txt <https://github.com/tesseract-ocr/tesseract/files/3102205/out5.txt> out6.txt <https://github.com/tesseract-ocr/tesseract/files/3102206/out6.txt> out7.txt <https://github.com/tesseract-ocr/tesseract/files/3102207/out7.txt> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2395 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG37I5R2EXKH757XJ3MES3PRVAGNANCNFSM4HHCRYLA> .

--

____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

nijanthan0 · 2019-04-22T05:25:53Z

I created box file using lsmbox comment and lstmf using lstm.train

Shreeshrii · 2019-04-22T08:43:51Z

What about Loaded 20/20 pages (1-20) of document data/ground-truth/tam.Impact_Condensed.exp0.lstmf? Impact_condensed font does not support Tamil? The problem is related to your input files. Please share training text or image and box pair.

…

On Mon, Apr 22, 2019 at 10:56 AM nijanthan0 ***@***.***> wrote: I created box file using lsmbox comment and lstmf using lstm.train — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2395 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG37I4GW5VGG45O6POSAZDPRVEAHANCNFSM4HHCRYLA> .

--

____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

nijanthan0 · 2019-04-22T10:52:45Z

Mam but for --eval_listfile I don't know what to give as input so i manually created one impact_condensed font file and then stored in eval listfile.

this is my files.
ground-truth.zip

Shreeshrii · 2019-04-22T12:31:06Z

Mam but for --eval_listfile I don't know what to give as input so i manually created one impact_condensed font file and then stored in eval listfile.

You have a large number of training files, use one of them for eval (eg. ocr2).

I am wondering whether Compute CTC targets failed! is related to the impact_condensed eval file.

I will test further with all the files you sent and get back.

nijanthan0 · 2019-04-22T12:44:03Z

No mam "Compute CTC targets failed!" is not related to impact_condensed eval file.

Shreeshrii · 2019-04-22T13:19:32Z

this is my files.
ground-truth.zip

The zip file has the OCRed text for the images. The ground truth needs to be the correct transcription for the images.

nijanthan0 · 2019-04-22T13:22:02Z

But I am not using text file in the training process.

Shreeshrii · 2019-04-22T13:25:26Z

Training uses box/tiff pairs for creating the lstmf files. If you give the wrong text for an image then all training will be wrong. Your box files also hold incorrect text only.

Shreeshrii · 2019-04-22T13:32:58Z

I tested by using the wordstrbox (without correcting the text).

lstmtraining \

--model_output build/poll
--continue_from ~/tessdata_best/script/Tamil.lstm
--traineddata ~/tessdata_best/script/Tamil.traineddata
--train_listfile build/tam.poll.training_files.txt
--debug_interval -1
Loaded file /home/ubuntu/tessdata_best/script/Tamil.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Continuing from /home/ubuntu/tessdata_best/script/Tamil.lstm
Loaded 54/54 lines (1-54) of document input/out10.lstmf
Loaded 53/53 lines (1-53) of document input/out11.lstmf
Loaded 8/8 lines (1-8) of document input/out5.lstmf
Loaded 57/57 lines (1-57) of document input/out7.lstmf
Loaded 57/57 lines (1-57) of document input/out3.lstmf
Loaded 54/54 lines (1-54) of document input/out8.lstmf
Loaded 56/56 lines (1-56) of document input/out4.lstmf
Loaded 58/58 lines (1-58) of document input/out6.lstmf
Loaded 55/55 lines (1-55) of document input/out9.lstmf
Iteration 0: GROUND TRUTH : பெயர்‌: மோகனா - பெயர்‌: மாதவன்‌ - பெயர்‌: தமிழ்ச்செல்வி -
Iteration 0: ALIGNED TRUTH : பெயர்‌: மோகனா - பெயர்‌: மாதவன்‌ - பெயர்‌: தமிழ்ச்செல்வி -
Iteration 0: BEST OCR TEXT : பெயர்‌: மோகனா - F|பெயர்‌: மாதவன்‌- . |-|பெயர்‌: தமிழ்ச்செல்வி -
File input/out10.lstmf line 0 :
Mean rms=2.079%, delta=6.897%, train=11.111%(44.444%), skip ratio=0%
Iteration 1: GROUND TRUTH : சட்டமன்றத்‌ தொகுதி எண்‌ மற்றும்‌ பெயர்‌ : 36-உத்திரமேரூர்‌ பாகம்‌ எண்‌: 1
Iteration 1: ALIGNED TRUTH : சட்டமன்றத்‌ தொகுதி எண்‌ மற்றும்‌ பெயர்‌ : 36-உத்திரமேரூர்‌ பாகம்‌ எண்‌:
Iteration 1: BEST OCR TEXT : சட்டமன்றத்‌ தொகுதி எண்‌ மற்றும்‌ பெயர்‌ : %-உத்திரமேரர்‌ .......எபாகம்‌ எண்‌: 1
File input/out11.lstmf line 0 :
Mean rms=2.03%, delta=4.742%, train=15.257%(32.222%), skip ratio=0%
Iteration 2: GROUND TRUTH : பெயர்‌: கிருட்டினன்‌ - பெயர்‌: வேதவல்லி - பெயர்‌: குப்பன்‌ -
Iteration 2: ALIGNED TRUTH : பெயர்‌: கிருட்டினன்‌ - பெயர்‌: வேதவல்லி - பெயர்‌: குப்பன்‌ -
Iteration 2: BEST OCR TEXT : பெயர்‌: கிருட்டினன்‌- [பெயர்‌: வேதவல்லி- [பெயர்‌: குப்பன்‌ -
File input/out3.lstmf line 0 :
Mean rms=1.848%, delta=4.556%, train=13.148%(43.704%), skip ratio=0%
Iteration 3: GROUND TRUTH : கணவர்‌ பெயர்‌: முருகன்‌ - தந்தை பெயர்‌: காசி - தந்தை பெயர்‌: இராமன்‌ -
Iteration 3: ALIGNED TRUTH : கணவர்‌ பெயர்‌: முருகன்‌ - தந்தை பெயர்‌: காசி - தந்தை பெயர்‌: இராமன்‌ -
Iteration 3: BEST OCR TEXT : கணவர்‌ பெயர்‌: முருகன்‌- | ந|[தந்தைபெயர்‌ காச- [ ந|தந்தை பெயர்‌: இராமன்‌ -
File input/out4.lstmf line 0 :
Mean rms=1.916%, delta=4.98%, train=13.707%(47.361%), skip ratio=0%
Iteration 4: GROUND TRUTH : பெயர்‌: கீதா - பெயர்‌: கெளரி -
Iteration 4: ALIGNED TRUTH : பெயர்‌: கீதா - பெயர்‌: கெளரி -
Iteration 4: BEST OCR TEXT : பெயர்‌. கதா - |[பெயர்‌: கெளரி -
File input/out5.lstmf line 0 :
Mean rms=1.894%, delta=4.984%, train=15.103%(47.889%), skip ratio=0%
Iteration 5: GROUND TRUTH : தந்த பெயர்‌: குட்டியப்பன்‌ - கணவர்‌ பெயர்‌: முனுசாமி - தந்த பெயர்‌: கன்னியப்பன்‌ -
Iteration 5: ALIGNED TRUTH : தந்த பெயர்‌: குட்டியப்பன்‌ - கணவர்‌ பெயர்‌: முனுசாமி - தந்த பெயர்‌: கன்னியப்பன்‌ -
Iteration 5: BEST OCR TEXT : தந்த பெயர்‌: குட்டியப்பன்‌- | |[|கணவர்‌ பெயர்‌: முனுசாமி- | |[[தந்தை பெயர்‌: கன்னியப்பன்‌ -
File input/out6.lstmf line 0 :
Mean rms=1.904%, delta=5.48%, train=14.751%(48.241%), skip ratio=0%
Iteration 6: GROUND TRUTH : வயது: ஏ பாலினம்‌ :ஆண்‌ வயது: % பாலினம்‌ ஆண்‌ வயது: 4 பாலினம்‌ :பெண்‌
Iteration 6: BEST OCR TEXT : வயது: ஏ பாலினம்‌ :ஆண்‌ | Aிleble ||வயது: % பாலினம்‌ :ஆண்‌ | rileble [வயது: & பாலினம்‌ :-பெண்‌
File input/out7.lstmf line 0 :
Mean rms=2.04%, delta=7.057%, train=18.539%(47.302%), skip ratio=0%
Iteration 7: GROUND TRUTH : வயது: ஏ பாலினம்‌ :ஆண்‌ Available ||வயது: 22 பாலினம்‌ பெண்‌ Available ||வயது: 25 பாலினம்‌ பெண்‌ Available
Iteration 7: ALIGNED TRUTH : வயது: ஏ பாலினம்‌ :ஆண்‌ Available ||வயது: 22 பாலினம்‌ பெண்‌ Avllable ||வயது: 25 பாலினம்‌ பெண்‌ Available
Iteration 7: BEST OCR TEXT : வயது: ஏ பாலினம்‌ :ஆண்‌ | இவிஷ்ீ |[வயது: 2 பாலினம்‌ பெண்‌ | விஷ்ச [[வயது: 25 பாலினம்‌ -பெண்‌ | vailable
File input/out8.lstmf line 0 :
Mean rms=2.089%, delta=7.843%, train=20.893%(47.222%), skip ratio=0%
Iteration 8: GROUND TRUTH : TRQO0226621 TN/O5/026/0393067 TN/O5/026/0393295
Iteration 8: ALIGNED TRUTH : TRQO0226621 TN/O5/026/0393067 TN/O5/026/0393295
Iteration 8: BEST OCR TEXT : TRQ0226621|[ V TNOSIO260393067 [ TNO5I026/0393295
File input/out9.lstmf line 0 :
Mean rms=2.099%, delta=8.025%, train=22.507%(53.086%), skip ratio=0%
Iteration 9: GROUND TRUTH : வீட்டு எண்‌: 41 Photo is வீட்டு எண்‌: 41 Photo is வீட்டு எண்‌: 41 Photo is
Iteration 9: ALIGNED TRUTH : வீட்டு எண்‌: 41 Photo is வீட்டு எண்‌: 441 Photo is வீட்டு எண்‌: 41 Photo is
Iteration 9: BEST OCR TEXT : வீட்டுஎண்‌4 | Photois |வீட்டுஎண்‌்4 | Photois |வீட்டுஎண்‌41 | Photos
File input/out10.lstmf line 1 :
Mean rms=2.091%, delta=8.12%, train=22.618%(57.778%), skip ratio=0%
Iteration 10: GROUND TRUTH : தந்த பெயர்‌: சின்னபையன்‌ - தந்த பெயர்‌: சின்னபையன்‌ - கணவர்‌ பெயர்‌: சங்கர்‌ -
Iteration 10: ALIGNED TRUTH : தந்த பெயர்‌: சின்னபையன்‌ - தந்த பெயர்‌: சின்னபையன்‌ - கணவர்‌ பெயர்‌: சங்கர்‌ -
Iteration 10: BEST OCR TEXT : தந்த பெயர்‌: சின்னபையன்‌- | |தந்தை பெயர்‌: சின்னபையன்‌ - | [கணவர்‌ பெயர்‌: சங்கர்‌ -
File input/out11.lstmf line 1 :
Mean rms=2.046%, delta=7.87%, train=21.193%(55.556%), skip ratio=0%
Iteration 11: GROUND TRUTH : - Photo is வீட்டு எண்‌: 4 Photo is வீட்டு எண்‌: 4 Photo is
Iteration 11: ALIGNED TRUTH : ------------- Photo is வீட்டு எண்‌: 4 Photo is வீட்டு எண்‌: 4 444 Photo is
Iteration 11: BEST OCR TEXT : D [ Photois வீட்டுஎண்‌:4 Photois |[வீட்டுிஎண்‌:4: | Photois

Shreeshrii · 2019-04-22T13:38:00Z

tamil.zip

This zip file has box files for your images in wordstr format. The text for each line needs to be corrected to match the image. Then you can use these box files with your images to create the lstmf files and then use them for lstmtraining.

However, some errors maybe because of incorrect layout analysis and more training will not fix those.

You need to use some other method, opencv, uzn etc to mark areas and then recognize them separately.

Shreeshrii · 2019-04-22T13:54:58Z

Anyway, in all my testing, didn't get the error Compute CTC targets failed!

nijanthan0 · 2019-04-23T05:01:07Z

Can we directly use the wordstr box file for training?

Shreeshrii · 2019-04-23T05:45:53Z

The text for each line needs to be corrected to match the image

The wordstr box file can be used for training AFTER you review and correct the text for each line. Currently it has been generated using the existing Tamil traineddata so it will have all errors that you see in recognition. For training you need to correct that text so that it matches the image.

Test with one file, use debug_level -1 to make sure it looks ok. Then apply to all images.

nijanthan0 · 2019-04-25T06:49:17Z

Mam, thank you for your help.But i have one problem.

lstmtraining --traineddata data/tamiltest/tamiltest.traineddata --old_traineddata tesseract/tessdata/tam.traineddata --continue_from data/tam/tam.lstm --perfect_sample_delay 0 --target_error_rate 0.01 --model_output data/checkpoints --debug_level -1 --train_listfile data/list.train --eval_listfile data/list.eval --max_iterations 10000
Loaded file data/checkpoints_checkpoint, unpacking...
Successfully restored trainer from data/checkpoints_checkpoint
Loaded 46/46 pages (1-46) of document data/ground-truth/out25.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out26.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out20.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out22.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out23.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out27.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out18.lstmf
Loaded 35/35 pages (1-35) of document data/ground-truth/out21.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out29.lstmf
Loaded 56/56 pages (1-56) of document data/ground-truth/out8.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out14.lstmf
At iteration 10000/10000/10000, Mean rms=5.03%, delta=48.282%, char train=98.223%, word train=97.784%, skip ratio=0%, New worst char error = 98.223 wrote checkpoint.

Finished! Error rate = 98.136

nijanthan0 · 2019-04-25T06:51:23Z

How do i reduce the error rate ??

Shreeshrii · 2019-04-25T08:12:16Z

If you are running with ` --debug_level -1` you will have details of every iteration. Usually the error rate will keep going down. It seems to me that you are training with about 500 lines of text. Are you getting any errors during training? Run for `--max_iterations 200` and look at the console log.

…

On Thu, Apr 25, 2019 at 12:21 PM nijanthan0 ***@***.***> wrote: How do i reduce the error rate ?? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2395 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG37I4OGA2THOB7GDW2RW3PSFIIHANCNFSM4HHCRYLA> .

--

____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

nijanthan0 · 2019-04-25T08:21:00Z

lstmtraining --traineddata data/tamiltest/tamiltest.traineddata --old_traineddata data/tam/tam.traineddata --continue_from data/tam/tam.lstm --perfect_sample_delay 0 --target_error_rate 0.01 --model_output data/checkpoints --debug_level -1 --train_listfile data/list.train --eval_listfile data/list.eval --max_iterations 200
Loaded file data/tam/tam.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Code range changed from 99 to 145!
Num (Extended) outputs,weights in Series:
1,36,0,1:1, 0
Num (Extended) outputs,weights in Series:
C3,3:9, 0
Ft16:16, 160
Total weights = 160
[C3,3Ft16]:16, 160
Mp3,3:16, 0
Lfys48:48, 12480
Lfx96:96, 55680
Lrx96:96, 74112
Lfx192:192, 221952
Fc145:145, 27985
Total weights = 392369
Previous null char=2 mapped to 144
Continuing from data/tam/tam.lstm
Loaded 46/46 pages (1-46) of document data/ground-truth/out25.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out26.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out27.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out22.lstmf
Loaded 35/35 pages (1-35) of document data/ground-truth/out21.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out23.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out18.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out20.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out29.lstmf
Loaded 56/56 pages (1-56) of document data/ground-truth/out8.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out14.lstmf
Loaded 35/35 pages (1-35) of document data/ground-truth/out1.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out24.lstmf
Loaded 55/55 pages (1-55) of document data/ground-truth/out13.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out16.lstmf
Loaded 35/35 pages (1-35) of document data/ground-truth/out6.lstmf
Loaded 35/35 pages (1-35) of document data/ground-truth/out9.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out30.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out11.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out12.lstmf
Loaded 45/45 pages (1-45) of document data/ground-truth/out10.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out7.lstmf
Loaded 35/35 pages (1-35) of document data/ground-truth/out19.lstmf
Loaded 47/47 pages (1-47) of document data/ground-truth/out28.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out17.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out5.lstmf
Loaded 57/57 pages (1-57) of document data/ground-truth/out15.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out4.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out3.lstmf
At iteration 100/100/100, Mean rms=5.965%, delta=66.111%, char train=154.374%, word train=99.521%, skip ratio=0%, New worst char error = 154.374 wrote checkpoint.

At iteration 200/200/200, Mean rms=6.777%, delta=86.539%, char train=165.383%, word train=99.594%, skip ratio=0%, New worst char error = 165.383 wrote checkpoint.

Finished! Error rate = 100

nijanthan0 · 2019-04-25T08:21:54Z

I didn't get any error during training.

nijanthan0 · 2019-04-25T08:26:15Z

I first extracted data file from image using " Tamil " tessdata and then i corrected the values of the text file. Then using the text file I created box file and tif file with help of text2image. Then I used " tam " tessdata for other training purpose(like unicharset,lstm training). Is this causes of high error rate?

Shreeshrii · 2019-04-25T08:43:48Z

Your images have English in them. If you want that to be recognized it needs to be in your unicharset. The tam.traineddata has a limited unicharset. By using that, a larger number of characters have to be added. Try using Tamil.traineddata for further training and see if that is better. I am not sure why you are not getting debug msgs on screen.

…

On Thu, 25 Apr 2019, 13:56 nijanthan0, ***@***.***> wrote: I first extracted data file from image using " Tamil " tessdata and then i corrected the values of the text file. Then using the text file I created box file and tif file with help of text2image. Then I used " tam " tessdata for other training purpose(like unicharset,lstm training). Is this causes of high error rate? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2395 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG37IZ3W4U5BUFXYK6AGBDPSFTMHANCNFSM4HHCRYLA> .

Shreeshrii · 2019-04-25T08:48:53Z

Code range changed from 99 to 145!

tam.unicharset is 99, your text has 145 unichars. On Thu, Apr 25, 2019 at 2:13 PM Shree Devi Kumar <[email protected]> wrote:

…

Your images have English in them. If you want that to be recognized it needs to be in your unicharset. The tam.traineddata has a limited unicharset. By using that, a larger number of characters have to be added. Try using Tamil.traineddata for further training and see if that is better. I am not sure why you are not getting debug msgs on screen. On Thu, 25 Apr 2019, 13:56 nijanthan0, ***@***.***> wrote: > I first extracted data file from image using " Tamil " tessdata and then > i corrected the values of the text file. Then using the text file I created > box file and tif file with help of text2image. Then I used " tam " tessdata > for other training purpose(like unicharset,lstm training). Is this causes > of high error rate? > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#2395 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ABG37IZ3W4U5BUFXYK6AGBDPSFTMHANCNFSM4HHCRYLA> > . >

--

____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Shreeshrii · 2019-04-25T09:09:25Z

--debug_interval -1 It is `interval` not `level`. -1 is minus one On Thu, Apr 25, 2019 at 2:18 PM Shree Devi Kumar <[email protected]> wrote:

…

>Code range changed from 99 to 145! tam.unicharset is 99, your text has 145 unichars. On Thu, Apr 25, 2019 at 2:13 PM Shree Devi Kumar ***@***.***> wrote: > Your images have English in them. If you want that to be recognized it > needs to be in your unicharset. > > The tam.traineddata has a limited unicharset. By using that, a larger > number of characters have to be added. > > Try using Tamil.traineddata for further training and see if that is > better. > > I am not sure why you are not getting debug msgs on screen. > > > On Thu, 25 Apr 2019, 13:56 nijanthan0, ***@***.***> wrote: > >> I first extracted data file from image using " Tamil " tessdata and then >> i corrected the values of the text file. Then using the text file I created >> box file and tif file with help of text2image. Then I used " tam " tessdata >> for other training purpose(like unicharset,lstm training). Is this causes >> of high error rate? >> >> — >> You are receiving this because you commented. >> Reply to this email directly, view it on GitHub >> <#2395 (comment)>, >> or mute the thread >> <https://github.com/notifications/unsubscribe-auth/ABG37IZ3W4U5BUFXYK6AGBDPSFTMHANCNFSM4HHCRYLA> >> . >> > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--

____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

nijanthan0 · 2019-04-25T09:48:41Z

lstmtraining --traineddata data/tamiltest/tamiltest.traineddata --old_traineddata data/tam/Tamil.traineddata --continue_from data/tam/tam.lstm --perfect_sample_delay 0 --target_error_rate 0.01 --model_output data/checkpoints --debug_interval -1 --train_listfile data/list.train --max_iterations 200
Loaded file data/checkpoints_checkpoint, unpacking...
Code range changed from 117 to 173!
Must supply the old traineddata for code conversion!
Loaded file data/tam/tam.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Code range changed from 99 to 173!
Num (Extended) outputs,weights in Series:
1,36,0,1:1, 0
Num (Extended) outputs,weights in Series:
C3,3:9, 0
Ft16:16, 160
Total weights = 160
[C3,3Ft16]:16, 160
Mp3,3:16, 0
Lfys48:48, 12480
Lfx96:96, 55680
Lrx96:96, 74112
Lfx192:192, 221952
Fc99:99, 19107
Total weights = 383491
Previous null char=2 mapped to 172
Continuing from data/tam/tam.lstm
Loaded 25/25 pages (1-25) of document data/ground-truth/out34.lstmf
Loaded 23/23 pages (1-23) of document data/ground-truth/out32.lstmf
Loaded 20/20 pages (1-20) of document data/ground-truth/out31.lstmf
Loaded 23/23 pages (1-23) of document data/ground-truth/out35.lstmf
lstmtraining: ../../src/ccutil/genericvector.h:724: T& GenericVector::operator const [with T = int]: Assertion `index >= 0 && index < size_used_' failed.
Aborted (core dumped)

"If i use Tamil.traineddata in the old trained data i get an error and also i used Tamil.lstm unicharset"

nijanthan0 · 2019-04-25T09:50:13Z

--debug_interval -1 It is interval not level. -1 is minus one On Thu, Apr 25, 2019 at 2:18 PM Shree Devi Kumar [email protected] wrote:
…

Code range changed from 99 to 145! tam.unicharset is 99, your text has 145 unichars. On Thu, Apr 25, 2019 at 2:13 PM Shree Devi Kumar @.> wrote: > Your images have English in them. If you want that to be recognized it > needs to be in your unicharset. > > The tam.traineddata has a limited unicharset. By using that, a larger > number of characters have to be added. > > Try using Tamil.traineddata for further training and see if that is > better. > > I am not sure why you are not getting debug msgs on screen. > > > On Thu, 25 Apr 2019, 13:56 nijanthan0, @.> wrote: > >> I first extracted data file from image using " Tamil " tessdata and then >> i corrected the values of the text file. Then using the text file I created >> box file and tif file with help of text2image. Then I used " tam " tessdata >> for other training purpose(like unicharset,lstm training). Is this causes >> of high error rate? >> >> — >> You are receiving this because you commented. >> Reply to this email directly, view it on GitHub >> <#2395 (comment)>, >> or mute the thread >> https://github.com/notifications/unsubscribe-auth/ABG37IZ3W4U5BUFXYK6AGBDPSFTMHANCNFSM4HHCRYLA >> . >> > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
--
____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

yes, Thank You, Now only i changed it ...😊

Shreeshrii · 2019-04-25T10:34:36Z

--old_traineddata data/tam/Tamil.traineddata --continue_from

data/tam/tam.lstm Both need to be in sync. Tamil.traineddata Tamil.lstm

nijanthan0 · 2019-04-25T11:00:39Z

Sorry I used fast tessdata of " Tamil.trainedata". Now only i am using best tessdata

nijanthan0 · 2019-04-25T11:06:06Z

lstmtraining --traineddata data/tamiltest/tamiltest.traineddata --old_traineddata tesseract/tessdata/Tamil.traineddata --continue_from data/Tamil/Tamil.lstm --perfect_sample_delay 0 --target_error_rate 0.01 --model_output data/checkpoints --debug_interval -1 --train_listfile data/list.train --max_iterations 200
Loaded file data/checkpoints_checkpoint, unpacking...
Successfully restored trainer from data/checkpoints_checkpoint
Loaded 20/20 pages (1-20) of document data/ground-truth/out31.lstmf
Loaded 23/23 pages (1-23) of document data/ground-truth/out35.lstmf
Loaded 23/23 pages (1-23) of document data/ground-truth/out32.lstmf
Loaded 24/24 pages (1-24) of document data/ground-truth/out33.lstmf
At iteration 200/200/200, Mean rms=5.646%, delta=70.988%, char train=138.671%, word train=99.23%, skip ratio=0%, New worst char error = 138.671 wrote checkpoint.

Finished! Error rate = 100

"In This also 100 error rate"

Shreeshrii · 2019-04-25T11:16:01Z

try with command similar to what i used - see #2395 (comment)

nijanthan0 · 2019-04-25T11:37:32Z

Same error

Shreeshrii · 2019-04-25T11:42:10Z

What does same error mean?

200 iterations was to test what was going wrong. Now you can train for more iterations.

For impact style fine tuning try 400-600 iterations.

For plus type fine tuning try 3000-3600.

YuTingLiu · 2019-11-29T09:33:26Z

Is this caused by the parameters of x_size not same in data generation and train?

stweil · 2021-01-20T12:47:14Z

Pull request #3251 improves the error message for "Compute CTC target failed" and now shows the lstmf file which is triggering that error. One possible reason for that error is a rotated text line.

wolfassi123 · 2022-02-28T15:22:44Z

@Shreeshrii I am always getting this error when I'm trying to train for Arabic. I am adding my own data in the "training_text" file and it consists of a lot of arabic numbers and dates.
I am constantly getting this issue. But I need to train the model into recognizing such dates and numbers, I'd rather not use different trained data, one for numbers and one for words.
Any idea how to solve such an issue?

drdmitry · 2022-07-21T15:22:37Z

I had a similar issue ("Compute CTC targets failed!") when I generated two lstmf files from a different type of .box files.
One box file was generated with boxes as full-width horizontal lines of text.
Another box file was generated with boxes for each particular letter of the text.
I had to regenerate box files (train and eval) using the same type of --psm parameters, and after that, the training went smoothly.

karan00713 · 2023-05-16T09:57:45Z

Hi i'm trying lstmtraining for tamil text, i'm facing compute ctc error
Compute CTC targets failed for /home/user/Aadhar/data/Aadhar-ground-truth/1.lstmf!
(=57 On [0, 2), scores= 1.12(:=59=1.11) 1.13(ஏ=64=1.12), Mean=1.12237, max=1.12645
ஏ=64 On [2, 4), scores= 1.13((=57=1.13) 1.13((=57=1.13), Mean=1.12687, max=1.1281
(=57 On [4, 6), scores= 1.13(ஏ=64=1.13) 1.14(ஏ=64=1.13), Mean=1.1351, max=1.13527
ஏ=64 On [6, 8), scores= 1.13((=57=1.14) 1.13((=57=1.14), Mean=1.12844, max=1.12863
(=57 On [8, 11), scores= 1.14(ஏ=64=1.13) 1.14(ஹ=66=1.13) 1.14(ஹ=66=1.13), Mean=1.13606, max=1.13633
Compute CTC targets failed for /home/user/Aadhar/data/Aadhar-ground-truth/2.lstmf!
(=57 On [0, 2), scores= 1.12(:=59=1.11) 1.13(ஏ=64=1.12), Mean=1.12268, max=1.12686
ஏ=64 On [2, 4), scores= 1.12((=57=1.13) 1.13((=57=1.13), Mean=1.12615, max=1.12734
(=57 On [4, 6), scores= 1.14(ஏ=64=1.13) 1.14(ஏ=64=1.13), Mean=1.13574, max=1.13588
ஏ=64 On [6, 8), scores= 1.13((=57=1.14) 1.13((=57=1.14), Mean=1.12798, max=1.12808
(=57 On [8, 10), scores= 1.14(ஏ=64=1.13) 1.14(ஏ=64=1.13), Mean=1.13546, max=1.13548
ஹ=66 On [10, 12), scores= 1.13((=57=1.14) 1.13((=57=1.14), Mean=1.12787, max=1.12802
(=57 On [12, 14), scores= 1.14(ஹ=66=1.13) 1.14(ஹ=66=1.13), Mean=1.13633, max=1.13645
Compute CTC targets failed for /home/user/Aadhar/data/Aadhar-ground-truth/3.lstmf!
(=57 On [0, 2), scores= 1.12(:=59=1.11) 1.13(ஏ=64=1.12), Mean=1.12269, max=1.12686
ஏ=64 On [2, 4), scores= 1.13((=57=1.13) 1.13((=57=1.13), Mean=1.12622, max=1.12741
(=57 On [4, 6), scores= 1.14(ஏ=64=1.13) 1.14(ஏ=64=1.13), Mean=1.13572, max=1.13585
ஏ=64 On [6, 8), scores= 1.13((=57=1.14) 1.13((=57=1.14), Mean=1.12804, max=1.12813
(=57 On [8, 10), scores= 1.14(ஏ=64=1.13) 1.14(ஏ=64=1.13), Mean=1.13534, max=1.13538
ஹ=66 On [10, 12), scores= 1.13((=57=1.14) 1.13((=57=1.14), Mean=1.12775, max=1.12785
(=57 On [12, 15), scores= 1.14(ஹ=66=1.13) 1.14(ஹ=66=1.13) 1.14(ஹ=66=1.13), Mean=1.13621, max=1.13649

MikhailesU · 2023-09-11T14:25:45Z

В обучении для создания файлов lstmf используются пары box/tiff. Если вы дадите неверный текст для изображения, то все обучение будет неправильным. Файлы вашего ящика также содержат только неверный текст.

that is, if I want to train the tesseract on text that it cannot see in the image, it will throw this error?

stweil · 2023-09-11T14:53:18Z

@karan00713, @kiberchert, recent software versions report the line image which caused the message. I suggest to visually inspect such images whether they are reasonable (not more than a single line, not rotated) and compare whether line image and line transcription match.

DesBw · 2023-09-19T12:20:48Z

This error occurs on my pc when text2image created empty tif files. The .lstmf files created out of those empty tif files trigger the error.
It looks like text2image has a lot of bugs. It created empty box files, as well as empty image files.

amitdo added the training label May 18, 2020

compute ctc target failed #2395

compute ctc target failed #2395

Comments

nijanthan0 commented Apr 19, 2019 • edited by stweil Loading

Environment

Current Behavior:

Expected Behavior:

Suggested Fix:

Shreeshrii commented Apr 21, 2019

nijanthan0 commented Apr 22, 2019

nijanthan0 commented Apr 22, 2019

Shreeshrii commented Apr 22, 2019 via email

nijanthan0 commented Apr 22, 2019

Shreeshrii commented Apr 22, 2019 via email

nijanthan0 commented Apr 22, 2019

Shreeshrii commented Apr 22, 2019

nijanthan0 commented Apr 22, 2019 • edited Loading

Shreeshrii commented Apr 22, 2019

nijanthan0 commented Apr 22, 2019

Shreeshrii commented Apr 22, 2019 • edited Loading

Shreeshrii commented Apr 22, 2019

Shreeshrii commented Apr 22, 2019

Shreeshrii commented Apr 22, 2019

nijanthan0 commented Apr 23, 2019

Shreeshrii commented Apr 23, 2019

nijanthan0 commented Apr 25, 2019

nijanthan0 commented Apr 25, 2019

Shreeshrii commented Apr 25, 2019 via email

nijanthan0 commented Apr 25, 2019

nijanthan0 commented Apr 25, 2019

nijanthan0 commented Apr 25, 2019

Shreeshrii commented Apr 25, 2019 via email

Shreeshrii commented Apr 25, 2019 via email

Shreeshrii commented Apr 25, 2019 via email

nijanthan0 commented Apr 25, 2019

nijanthan0 commented Apr 25, 2019

Shreeshrii commented Apr 25, 2019 via email

nijanthan0 commented Apr 25, 2019

nijanthan0 commented Apr 25, 2019 • edited Loading

Shreeshrii commented Apr 25, 2019

nijanthan0 commented Apr 25, 2019

Shreeshrii commented Apr 25, 2019

YuTingLiu commented Nov 29, 2019

stweil commented Jan 20, 2021

wolfassi123 commented Feb 28, 2022

drdmitry commented Jul 21, 2022 • edited Loading

karan00713 commented May 16, 2023

MikhailesU commented Sep 11, 2023

stweil commented Sep 11, 2023 • edited Loading

DesBw commented Sep 19, 2023 • edited Loading

nijanthan0 commented Apr 19, 2019 •

edited by stweil

Loading

nijanthan0 commented Apr 22, 2019 •

edited

Loading

Shreeshrii commented Apr 22, 2019 •

edited

Loading

nijanthan0 commented Apr 25, 2019 •

edited

Loading

drdmitry commented Jul 21, 2022 •

edited

Loading

stweil commented Sep 11, 2023 •

edited

Loading

DesBw commented Sep 19, 2023 •

edited

Loading