Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compute ctc target failed #2395

Open
nijanthan0 opened this issue Apr 19, 2019 · 42 comments
Open

compute ctc target failed #2395

nijanthan0 opened this issue Apr 19, 2019 · 42 comments
Labels

Comments

@nijanthan0
Copy link

nijanthan0 commented Apr 19, 2019

Environment

  • Tesseract Version: 4.0.1
  • Commit Number:
  • Platform: ubuntu 14

Current Behavior:

Expected Behavior:

Suggested Fix:

surasystem@surasystem:~$ lstmtraining --traineddata data/tamtrain/tamtrain.traineddata --old_traineddata tesseract/tessdata/tam.traineddata --continue_from data/tam/tam.lstm --net_spec '[Lfx256 O1c111]' --model_output data/checkpoints --learning_rate 20e-4 --train_listfile data/list.train --eval_listfile data/list.eval --max_iterations 3000
Loaded file data/tam/tam.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Code range changed from 99 to 104!
Num (Extended) outputs,weights in Series:
1,36,0,1:1, 0
Num (Extended) outputs,weights in Series:
C3,3:9, 0
Ft16:16, 160
Total weights = 160
[C3,3Ft16]:16, 160
Mp3,3:16, 0
Lfys48:48, 12480
Lfx96:96, 55680
Lrx96:96, 74112
Lfx192:192, 221952
Fc104:104, 20072
Total weights = 384456
Previous null char=2 mapped to 103
Continuing from data/tam/tam.lstm
Loaded 54/54 pages (1-54) of document data/ground-truth/out8.lstmf
Loaded 57/57 pages (1-57) of document data/ground-truth/tam.TAMu_Kadambri.exp0.lstmf
Loaded 20/20 pages (1-20) of document data/ground-truth/tam.Impact_Condensed.exp0.lstmf
Loaded 8/8 pages (1-8) of document data/ground-truth/out5.lstmf
Loaded 28/28 pages (1-28) of document data/ground-truth/out2.lstmf
Loaded 57/57 pages (1-57) of document data/ground-truth/out3.lstmf
Loaded 56/56 pages (1-56) of document data/ground-truth/out4.lstmf
Loaded 55/55 pages (1-55) of document data/ground-truth/out9.lstmf
Loaded 58/58 pages (1-58) of document data/ground-truth/out6.lstmf
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!

what i want to do to overcome this issue..

@Shreeshrii
Copy link
Collaborator

--old_traineddata tesseract/tessdata/tam.traineddata

Is this file taken from tessdata_best repo?

lstmtraining --traineddata data/tamtrain/tamtrain.traineddata --old_traineddata tesseract/tessdata/tam.traineddata --continue_from data/tam/tam.lstm --net_spec '[Lfx256 O1c111]' --model_output data/checkpoints --learning_rate 20e-4 --train_listfile data/list.train --eval_listfile data/list.eval --max_iterations 3000

Run your command with --debug_level -1 and share console output and also the training_text used.

@nijanthan0
Copy link
Author

yes, I am using best tess data

@nijanthan0
Copy link
Author

lstmtraining --traineddata data/tamtrain/tamtrain.traineddata --old_traineddata tesseract/tessdata/tam.traineddata --continue_from data/tam/tam.lstm --net_spec '[Lfx256 O1c111]' --model_output data/checkpoints --debug_level -1 --learning_rate 20e-4 --train_listfile data/list.train --eval_listfile data/list.eval --max_iterations 3000
Loaded file data/checkpoints_checkpoint, unpacking...
Successfully restored trainer from data/checkpoints_checkpoint
Loaded 54/54 pages (1-54) of document data/ground-truth/out8.lstmf
Loaded 20/20 pages (1-20) of document data/ground-truth/tam.Impact_Condensed.exp0.lstmf
Loaded 8/8 pages (1-8) of document data/ground-truth/out5.lstmf
Loaded 28/28 pages (1-28) of document data/ground-truth/out2.lstmf
Loaded 58/58 pages (1-58) of document data/ground-truth/out6.lstmf
Loaded 57/57 pages (1-57) of document data/ground-truth/out3.lstmf
Loaded 55/55 pages (1-55) of document data/ground-truth/out9.lstmf
Loaded 56/56 pages (1-56) of document data/ground-truth/out4.lstmf
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Encoding of string failed! Failure bytes: 23 20 2d ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff88 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffaf ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffaf ffffff8d 20 ffffffe0 ffffffae ffffff89 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffae ffffffbe ffffffe0 ffffffae ffffffb1 ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffb0 ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8a ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffa4 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff95 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffff99 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d 20 32 36 20 2d 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff95 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8d 20 35
Can't encode transcription: 'வயது : 01.01. 2019 ல் # -துணை பட்டியலில் உள்ளவாறு திருத்தப்பட்டுள்ளது மொத்த பக்கங்கள் 26 - பக்கம் 5' in language ''
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Encoding of string failed! Failure bytes: 23 ffffffc2 ffffffa3 34 30 31 30 20 31 36 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 2e 20 33 36 20 31 30 31 30 20 31 36
Can't encode transcription: '- 7010 16 வீட்டு எண். % #£4010 16 வீட்டு எண். 36 1010 16' in language ''
Compute CTC targets failed!
Encoding of string failed! Failure bytes: ffffffe0 ffffffaf ffffff8c ffffffe0 ffffffae ffffffb0 ffffffe0 ffffffae ffffffbf 20 2d
Can't encode transcription: 'பெயர்: கீதா - பெயர். கௌரி -' in language ''
Encoding of string failed! Failure bytes: 23 30 30 34 30 20 31 38 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 2e 20 31 39 20 23 31 30 34 30 20 31 38 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 2e 20 31 39 20 31 30 34 30 20 31 38
Can't encode transcription: 'வீட்டு எண். 18 #40 18 வீட்டு எண். 19 #1040 18 வீட்டு எண். 19 1040 18' in language ''
Encoding of string failed! Failure bytes: 5c ffffffe0 ffffffaf ffffffa8 20 7c 20 7c 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffae ffffffaf ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 3a 20 32 32 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffffbe ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffa9 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8d 20 3a ffffffe0 ffffffae ffffffaa ffffffe0 ffffffaf ffffff86 ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 20 26 ffffffe0 ffffffae ffffffb5 20 7c 20 7c 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffae ffffffaf ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 3a 20 32 35 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffffbe ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffa9 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8d 20 3a ffffffe0 ffffffae ffffffaa ffffffe0 ffffffaf ffffff86 ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 20 32 2f 32 31 31 25 30 31
Can't encode transcription: 'வயது: 37 பாலினம் :ஆண் ஃலிஸ்\௨ | | வயது: 22 பாலினம் :பெண் &வ | | வயது: 25 பாலினம் :பெண் 2/211%01' in language ''
Encoding of string failed! Failure bytes: 23 30 34 30 20 31 35 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 2e 20 31 37 20 23 30 30 34 30 20 31 36 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 2e 20 31 33 38 20 37 30 34 30 20 31 35
Can't encode transcription: 'வீட்டு எண். 2 £#40 15 வீட்டு எண். 17 #40 16 வீட்டு எண். 138 7040 15' in language ''
Encoding of string failed! Failure bytes: 23 36 34 30 20 3d 20 7c 20 7c 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 2e 20 31 34 2d 26 20 31 ffffffc2 ffffffa3 31 30 34 30 20 31 32
Can't encode transcription: 'குப்புசாமிநாயக்கர் - £1640 | | | வீட்டு எண். 14-& #640 = | | வீட்டு எண். 14-& 1£1040 12' in language ''
Compute CTC targets failed!
Compute CTC targets failed!
Encoding of string failed! Failure bytes: 23 20 2d ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff88 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffaf ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffaf ffffff8d 20 ffffffe0 ffffffae ffffff89 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffae ffffffbe ffffffe0 ffffffae ffffffb1 ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffb0 ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8a ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffa4 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff95 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffff99 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d 20 32 36 20 2d 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff95 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8d 20 36
Can't encode transcription: 'வயது : 01.01. 2019 ல் # -துணை பட்டியலில் உள்ளவாறு திருத்தப்பட்டுள்ளது மொத்த பக்கங்கள் 26 - பக்கம் 6' in language ''
Compute CTC targets failed!
Encoding of string failed! Failure bytes: 23 20 2d ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff88 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffaf ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffb2 ffffffe0 ffffffaf ffffff8d 20 ffffffe0 ffffffae ffffff89 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffae ffffffbe ffffffe0 ffffffae ffffffb1 ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffae ffffffbf ffffffe0 ffffffae ffffffb0 ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8a ffffffe0 ffffffae ffffffa4 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffffa4 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff95 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffff99 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffffb3 ffffffe0 ffffffaf ffffff8d 20 32 36 20 2d 20 ffffffe0 ffffffae ffffffaa ffffffe0 ffffffae ffffff95 ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff95 ffffffe0 ffffffae ffffffae ffffffe0 ffffffaf ffffff8d 20 39
Can't encode transcription: 'வயது : 01.01. 2019 ல் # -துணை பட்டியலில் உள்ளவாறு திருத்தப்பட்டுள்ளது மொத்த பக்கங்கள் 26 - பக்கம் 9' in language ''
Compute CTC targets failed!
Compute CTC targets failed!
Encoding of string failed! Failure bytes: 23 31 30 34 30 20 31 38 20 ffffffe0 ffffffae ffffffb5 ffffffe0 ffffffaf ffffff80 ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff8d ffffffe0 ffffffae ffffff9f ffffffe0 ffffffaf ffffff81 20 ffffffe0 ffffffae ffffff8e ffffffe0 ffffffae ffffffa3 ffffffe0 ffffffaf ffffff8d 2e 20 33 34 20 31 30 34 30 20 31 36
Can't encode transcription: 'வீட்டு எண். 34 2040 16 வீட்டு எண். 34 #1040 18 வீட்டு எண். 34 1040 16' in language ''
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
This is the output after putting the debug level
out2.txt
out3.txt
out4.txt
out5.txt
out6.txt
out7.txt

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Apr 22, 2019 via email

@nijanthan0
Copy link
Author

I created box file using lsmbox comment and lstmf using lstm.train

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Apr 22, 2019 via email

@nijanthan0
Copy link
Author

Mam but for --eval_listfile I don't know what to give as input so i manually created one impact_condensed font file and then stored in eval listfile.

this is my files.
ground-truth.zip

@Shreeshrii
Copy link
Collaborator

Mam but for --eval_listfile I don't know what to give as input so i manually created one impact_condensed font file and then stored in eval listfile.

You have a large number of training files, use one of them for eval (eg. ocr2).

I am wondering whether Compute CTC targets failed! is related to the impact_condensed eval file.

I will test further with all the files you sent and get back.

@nijanthan0
Copy link
Author

nijanthan0 commented Apr 22, 2019

No mam "Compute CTC targets failed!" is not related to impact_condensed eval file.

@Shreeshrii
Copy link
Collaborator

this is my files.
ground-truth.zip

The zip file has the OCRed text for the images. The ground truth needs to be the correct transcription for the images.

@nijanthan0
Copy link
Author

But I am not using text file in the training process.

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Apr 22, 2019

Training uses box/tiff pairs for creating the lstmf files. If you give the wrong text for an image then all training will be wrong. Your box files also hold incorrect text only.

@Shreeshrii
Copy link
Collaborator

I tested by using the wordstrbox (without correcting the text).

lstmtraining \

--model_output build/poll
--continue_from ~/tessdata_best/script/Tamil.lstm
--traineddata ~/tessdata_best/script/Tamil.traineddata
--train_listfile build/tam.poll.training_files.txt
--debug_interval -1
Loaded file /home/ubuntu/tessdata_best/script/Tamil.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Continuing from /home/ubuntu/tessdata_best/script/Tamil.lstm
Loaded 54/54 lines (1-54) of document input/out10.lstmf
Loaded 53/53 lines (1-53) of document input/out11.lstmf
Loaded 8/8 lines (1-8) of document input/out5.lstmf
Loaded 57/57 lines (1-57) of document input/out7.lstmf
Loaded 57/57 lines (1-57) of document input/out3.lstmf
Loaded 54/54 lines (1-54) of document input/out8.lstmf
Loaded 56/56 lines (1-56) of document input/out4.lstmf
Loaded 58/58 lines (1-58) of document input/out6.lstmf
Loaded 55/55 lines (1-55) of document input/out9.lstmf
Iteration 0: GROUND TRUTH : பெயர்‌: மோகனா - பெயர்‌: மாதவன்‌ - பெயர்‌: தமிழ்ச்செல்வி -
Iteration 0: ALIGNED TRUTH : பெயர்‌: மோகனா - பெயர்‌: மாதவன்‌ - பெயர்‌: தமிழ்ச்செல்வி -
Iteration 0: BEST OCR TEXT : பெயர்‌: மோகனா - F|பெயர்‌: மாதவன்‌- . |-|பெயர்‌: தமிழ்ச்செல்வி -
File input/out10.lstmf line 0 :
Mean rms=2.079%, delta=6.897%, train=11.111%(44.444%), skip ratio=0%
Iteration 1: GROUND TRUTH : சட்டமன்றத்‌ தொகுதி எண்‌ மற்றும்‌ பெயர்‌ : 36-உத்திரமேரூர்‌ பாகம்‌ எண்‌: 1
Iteration 1: ALIGNED TRUTH : சட்டமன்றத்‌ தொகுதி எண்‌ மற்றும்‌ பெயர்‌ : 36-உத்திரமேரூர்‌ பாகம்‌ எண்‌:
Iteration 1: BEST OCR TEXT : சட்டமன்றத்‌ தொகுதி எண்‌ மற்றும்‌ பெயர்‌ : %-உத்திரமேரர்‌ .......எபாகம்‌ எண்‌: 1
File input/out11.lstmf line 0 :
Mean rms=2.03%, delta=4.742%, train=15.257%(32.222%), skip ratio=0%
Iteration 2: GROUND TRUTH : பெயர்‌: கிருட்டினன்‌ - பெயர்‌: வேதவல்லி - பெயர்‌: குப்பன்‌ -
Iteration 2: ALIGNED TRUTH : பெயர்‌: கிருட்டினன்‌ - பெயர்‌: வேதவல்லி - பெயர்‌: குப்பன்‌ -
Iteration 2: BEST OCR TEXT : பெயர்‌: கிருட்டினன்‌- [பெயர்‌: வேதவல்லி- [பெயர்‌: குப்பன்‌ -
File input/out3.lstmf line 0 :
Mean rms=1.848%, delta=4.556%, train=13.148%(43.704%), skip ratio=0%
Iteration 3: GROUND TRUTH : கணவர்‌ பெயர்‌: முருகன்‌ - தந்தை பெயர்‌: காசி - தந்தை பெயர்‌: இராமன்‌ -
Iteration 3: ALIGNED TRUTH : கணவர்‌ பெயர்‌: முருகன்‌ - தந்தை பெயர்‌: காசி - தந்தை பெயர்‌: இராமன்‌ -
Iteration 3: BEST OCR TEXT : கணவர்‌ பெயர்‌: முருகன்‌- | ந|[தந்தைபெயர்‌ காச- [ ந|தந்தை பெயர்‌: இராமன்‌ -
File input/out4.lstmf line 0 :
Mean rms=1.916%, delta=4.98%, train=13.707%(47.361%), skip ratio=0%
Iteration 4: GROUND TRUTH : பெயர்‌: கீதா - பெயர்‌: கெளரி -
Iteration 4: ALIGNED TRUTH : பெயர்‌: கீதா - பெயர்‌: கெளரி -
Iteration 4: BEST OCR TEXT : பெயர்‌. கதா - |[பெயர்‌: கெளரி -
File input/out5.lstmf line 0 :
Mean rms=1.894%, delta=4.984%, train=15.103%(47.889%), skip ratio=0%
Iteration 5: GROUND TRUTH : தந்த பெயர்‌: குட்டியப்பன்‌ - கணவர்‌ பெயர்‌: முனுசாமி - தந்த பெயர்‌: கன்னியப்பன்‌ -
Iteration 5: ALIGNED TRUTH : தந்த பெயர்‌: குட்டியப்பன்‌ - கணவர்‌ பெயர்‌: முனுசாமி - தந்த பெயர்‌: கன்னியப்பன்‌ -
Iteration 5: BEST OCR TEXT : தந்த பெயர்‌: குட்டியப்பன்‌- | |[|கணவர்‌ பெயர்‌: முனுசாமி- | |[[தந்தை பெயர்‌: கன்னியப்பன்‌ -
File input/out6.lstmf line 0 :
Mean rms=1.904%, delta=5.48%, train=14.751%(48.241%), skip ratio=0%
Iteration 6: GROUND TRUTH : வயது: ஏ பாலினம்‌ :ஆண்‌ வயது: % பாலினம்‌ ஆண்‌ வயது: 4 பாலினம்‌ :பெண்‌
Iteration 6: BEST OCR TEXT : வயது: ஏ பாலினம்‌ :ஆண்‌ | Aிleble ||வயது: % பாலினம்‌ :ஆண்‌ | rileble [வயது: & பாலினம்‌ :-பெண்‌
File input/out7.lstmf line 0 :
Mean rms=2.04%, delta=7.057%, train=18.539%(47.302%), skip ratio=0%
Iteration 7: GROUND TRUTH : வயது: ஏ பாலினம்‌ :ஆண்‌ Available ||வயது: 22 பாலினம்‌ பெண்‌ Available ||வயது: 25 பாலினம்‌ பெண்‌ Available
Iteration 7: ALIGNED TRUTH : வயது: ஏ பாலினம்‌ :ஆண்‌ Available ||வயது: 22 பாலினம்‌ பெண்‌ Avllable ||வயது: 25 பாலினம்‌ பெண்‌ Available
Iteration 7: BEST OCR TEXT : வயது: ஏ பாலினம்‌ :ஆண்‌ | இவிஷ்ீ |[வயது: 2 பாலினம்‌ பெண்‌ | விஷ்ச [[வயது: 25 பாலினம்‌ -பெண்‌ | vailable
File input/out8.lstmf line 0 :
Mean rms=2.089%, delta=7.843%, train=20.893%(47.222%), skip ratio=0%
Iteration 8: GROUND TRUTH : TRQO0226621 TN/O5/026/0393067 TN/O5/026/0393295
Iteration 8: ALIGNED TRUTH : TRQO0226621 TN/O5/026/0393067 TN/O5/026/0393295
Iteration 8: BEST OCR TEXT : TRQ0226621|[ V TNOSIO260393067 [ TNO5I026/0393295
File input/out9.lstmf line 0 :
Mean rms=2.099%, delta=8.025%, train=22.507%(53.086%), skip ratio=0%
Iteration 9: GROUND TRUTH : வீட்டு எண்‌: 41 Photo is வீட்டு எண்‌: 41 Photo is வீட்டு எண்‌: 41 Photo is
Iteration 9: ALIGNED TRUTH : வீட்டு எண்‌: 41 Photo is வீட்டு எண்‌: 441 Photo is வீட்டு எண்‌: 41 Photo is
Iteration 9: BEST OCR TEXT : வீட்டுஎண்‌4 | Photois |வீட்டுஎண்‌்4 | Photois |வீட்டுஎண்‌41 | Photos
File input/out10.lstmf line 1 :
Mean rms=2.091%, delta=8.12%, train=22.618%(57.778%), skip ratio=0%
Iteration 10: GROUND TRUTH : தந்த பெயர்‌: சின்னபையன்‌ - தந்த பெயர்‌: சின்னபையன்‌ - கணவர்‌ பெயர்‌: சங்கர்‌ -
Iteration 10: ALIGNED TRUTH : தந்த பெயர்‌: சின்னபையன்‌ - தந்த பெயர்‌: சின்னபையன்‌ - கணவர்‌ பெயர்‌: சங்கர்‌ -
Iteration 10: BEST OCR TEXT : தந்த பெயர்‌: சின்னபையன்‌- | |தந்தை பெயர்‌: சின்னபையன்‌ - | [கணவர்‌ பெயர்‌: சங்கர்‌ -
File input/out11.lstmf line 1 :
Mean rms=2.046%, delta=7.87%, train=21.193%(55.556%), skip ratio=0%
Iteration 11: GROUND TRUTH : - Photo is வீட்டு எண்‌: 4 Photo is வீட்டு எண்‌: 4 Photo is
Iteration 11: ALIGNED TRUTH : ------------- Photo is வீட்டு எண்‌: 4 Photo is வீட்டு எண்‌: 4 444 Photo is
Iteration 11: BEST OCR TEXT : D [ Photois வீட்டுஎண்‌:4 Photois |[வீட்டுிஎண்‌:4: | Photois

@Shreeshrii
Copy link
Collaborator

tamil.zip

This zip file has box files for your images in wordstr format. The text for each line needs to be corrected to match the image. Then you can use these box files with your images to create the lstmf files and then use them for lstmtraining.

However, some errors maybe because of incorrect layout analysis and more training will not fix those.

You need to use some other method, opencv, uzn etc to mark areas and then recognize them separately.

@Shreeshrii
Copy link
Collaborator

Anyway, in all my testing, didn't get the error Compute CTC targets failed!

@nijanthan0
Copy link
Author

Can we directly use the wordstr box file for training?

@Shreeshrii
Copy link
Collaborator

The text for each line needs to be corrected to match the image

The wordstr box file can be used for training AFTER you review and correct the text for each line. Currently it has been generated using the existing Tamil traineddata so it will have all errors that you see in recognition. For training you need to correct that text so that it matches the image.

Test with one file, use debug_level -1 to make sure it looks ok. Then apply to all images.

@nijanthan0
Copy link
Author

Mam, thank you for your help.But i have one problem.

lstmtraining --traineddata data/tamiltest/tamiltest.traineddata --old_traineddata tesseract/tessdata/tam.traineddata --continue_from data/tam/tam.lstm --perfect_sample_delay 0 --target_error_rate 0.01 --model_output data/checkpoints --debug_level -1 --train_listfile data/list.train --eval_listfile data/list.eval --max_iterations 10000
Loaded file data/checkpoints_checkpoint, unpacking...
Successfully restored trainer from data/checkpoints_checkpoint
Loaded 46/46 pages (1-46) of document data/ground-truth/out25.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out26.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out20.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out22.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out23.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out27.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out18.lstmf
Loaded 35/35 pages (1-35) of document data/ground-truth/out21.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out29.lstmf
Loaded 56/56 pages (1-56) of document data/ground-truth/out8.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out14.lstmf
At iteration 10000/10000/10000, Mean rms=5.03%, delta=48.282%, char train=98.223%, word train=97.784%, skip ratio=0%, New worst char error = 98.223 wrote checkpoint.

Finished! Error rate = 98.136

@nijanthan0
Copy link
Author

How do i reduce the error rate ??

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Apr 25, 2019 via email

@nijanthan0
Copy link
Author

lstmtraining --traineddata data/tamiltest/tamiltest.traineddata --old_traineddata data/tam/tam.traineddata --continue_from data/tam/tam.lstm --perfect_sample_delay 0 --target_error_rate 0.01 --model_output data/checkpoints --debug_level -1 --train_listfile data/list.train --eval_listfile data/list.eval --max_iterations 200
Loaded file data/tam/tam.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Code range changed from 99 to 145!
Num (Extended) outputs,weights in Series:
1,36,0,1:1, 0
Num (Extended) outputs,weights in Series:
C3,3:9, 0
Ft16:16, 160
Total weights = 160
[C3,3Ft16]:16, 160
Mp3,3:16, 0
Lfys48:48, 12480
Lfx96:96, 55680
Lrx96:96, 74112
Lfx192:192, 221952
Fc145:145, 27985
Total weights = 392369
Previous null char=2 mapped to 144
Continuing from data/tam/tam.lstm
Loaded 46/46 pages (1-46) of document data/ground-truth/out25.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out26.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out27.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out22.lstmf
Loaded 35/35 pages (1-35) of document data/ground-truth/out21.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out23.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out18.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out20.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out29.lstmf
Loaded 56/56 pages (1-56) of document data/ground-truth/out8.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out14.lstmf
Loaded 35/35 pages (1-35) of document data/ground-truth/out1.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out24.lstmf
Loaded 55/55 pages (1-55) of document data/ground-truth/out13.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out16.lstmf
Loaded 35/35 pages (1-35) of document data/ground-truth/out6.lstmf
Loaded 35/35 pages (1-35) of document data/ground-truth/out9.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out30.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out11.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out12.lstmf
Loaded 45/45 pages (1-45) of document data/ground-truth/out10.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out7.lstmf
Loaded 35/35 pages (1-35) of document data/ground-truth/out19.lstmf
Loaded 47/47 pages (1-47) of document data/ground-truth/out28.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out17.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out5.lstmf
Loaded 57/57 pages (1-57) of document data/ground-truth/out15.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out4.lstmf
Loaded 34/34 pages (1-34) of document data/ground-truth/out3.lstmf
At iteration 100/100/100, Mean rms=5.965%, delta=66.111%, char train=154.374%, word train=99.521%, skip ratio=0%, New worst char error = 154.374 wrote checkpoint.

At iteration 200/200/200, Mean rms=6.777%, delta=86.539%, char train=165.383%, word train=99.594%, skip ratio=0%, New worst char error = 165.383 wrote checkpoint.

Finished! Error rate = 100

@nijanthan0
Copy link
Author

I didn't get any error during training.

@nijanthan0
Copy link
Author

I first extracted data file from image using " Tamil " tessdata and then i corrected the values of the text file. Then using the text file I created box file and tif file with help of text2image. Then I used " tam " tessdata for other training purpose(like unicharset,lstm training). Is this causes of high error rate?

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Apr 25, 2019 via email

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Apr 25, 2019 via email

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Apr 25, 2019 via email

@nijanthan0
Copy link
Author

lstmtraining --traineddata data/tamiltest/tamiltest.traineddata --old_traineddata data/tam/Tamil.traineddata --continue_from data/tam/tam.lstm --perfect_sample_delay 0 --target_error_rate 0.01 --model_output data/checkpoints --debug_interval -1 --train_listfile data/list.train --max_iterations 200
Loaded file data/checkpoints_checkpoint, unpacking...
Code range changed from 117 to 173!
Must supply the old traineddata for code conversion!
Loaded file data/tam/tam.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Code range changed from 99 to 173!
Num (Extended) outputs,weights in Series:
1,36,0,1:1, 0
Num (Extended) outputs,weights in Series:
C3,3:9, 0
Ft16:16, 160
Total weights = 160
[C3,3Ft16]:16, 160
Mp3,3:16, 0
Lfys48:48, 12480
Lfx96:96, 55680
Lrx96:96, 74112
Lfx192:192, 221952
Fc99:99, 19107
Total weights = 383491
Previous null char=2 mapped to 172
Continuing from data/tam/tam.lstm
Loaded 25/25 pages (1-25) of document data/ground-truth/out34.lstmf
Loaded 23/23 pages (1-23) of document data/ground-truth/out32.lstmf
Loaded 20/20 pages (1-20) of document data/ground-truth/out31.lstmf
Loaded 23/23 pages (1-23) of document data/ground-truth/out35.lstmf
lstmtraining: ../../src/ccutil/genericvector.h:724: T& GenericVector::operator const [with T = int]: Assertion `index >= 0 && index < size_used_' failed.
Aborted (core dumped)

"If i use Tamil.traineddata in the old trained data i get an error and also i used Tamil.lstm unicharset"

@nijanthan0
Copy link
Author

--debug_interval -1 It is interval not level. -1 is minus one On Thu, Apr 25, 2019 at 2:18 PM Shree Devi Kumar [email protected] wrote:

Code range changed from 99 to 145! tam.unicharset is 99, your text has 145 unichars. On Thu, Apr 25, 2019 at 2:13 PM Shree Devi Kumar @.> wrote: > Your images have English in them. If you want that to be recognized it > needs to be in your unicharset. > > The tam.traineddata has a limited unicharset. By using that, a larger > number of characters have to be added. > > Try using Tamil.traineddata for further training and see if that is > better. > > I am not sure why you are not getting debug msgs on screen. > > > On Thu, 25 Apr 2019, 13:56 nijanthan0, @.> wrote: > >> I first extracted data file from image using " Tamil " tessdata and then >> i corrected the values of the text file. Then using the text file I created >> box file and tif file with help of text2image. Then I used " tam " tessdata >> for other training purpose(like unicharset,lstm training). Is this causes >> of high error rate? >> >> — >> You are receiving this because you commented. >> Reply to this email directly, view it on GitHub >> <#2395 (comment)>, >> or mute the thread >> https://github.com/notifications/unsubscribe-auth/ABG37IZ3W4U5BUFXYK6AGBDPSFTMHANCNFSM4HHCRYLA >> . >> > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
--
____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

yes, Thank You, Now only i changed it ...😊

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Apr 25, 2019 via email

@nijanthan0
Copy link
Author

Sorry I used fast tessdata of " Tamil.trainedata". Now only i am using best tessdata

@nijanthan0
Copy link
Author

nijanthan0 commented Apr 25, 2019

lstmtraining --traineddata data/tamiltest/tamiltest.traineddata --old_traineddata tesseract/tessdata/Tamil.traineddata --continue_from data/Tamil/Tamil.lstm --perfect_sample_delay 0 --target_error_rate 0.01 --model_output data/checkpoints --debug_interval -1 --train_listfile data/list.train --max_iterations 200
Loaded file data/checkpoints_checkpoint, unpacking...
Successfully restored trainer from data/checkpoints_checkpoint
Loaded 20/20 pages (1-20) of document data/ground-truth/out31.lstmf
Loaded 23/23 pages (1-23) of document data/ground-truth/out35.lstmf
Loaded 23/23 pages (1-23) of document data/ground-truth/out32.lstmf
Loaded 24/24 pages (1-24) of document data/ground-truth/out33.lstmf
At iteration 200/200/200, Mean rms=5.646%, delta=70.988%, char train=138.671%, word train=99.23%, skip ratio=0%, New worst char error = 138.671 wrote checkpoint.

Finished! Error rate = 100

"In This also 100 error rate"

@Shreeshrii
Copy link
Collaborator

try with command similar to what i used - see #2395 (comment)

@nijanthan0
Copy link
Author

Same error

@Shreeshrii
Copy link
Collaborator

What does same error mean?

200 iterations was to test what was going wrong. Now you can train for more iterations.

For impact style fine tuning try 400-600 iterations.

For plus type fine tuning try 3000-3600.

@YuTingLiu
Copy link

Is this caused by the parameters of x_size not same in data generation and train?

@stweil
Copy link
Member

stweil commented Jan 20, 2021

Pull request #3251 improves the error message for "Compute CTC target failed" and now shows the lstmf file which is triggering that error. One possible reason for that error is a rotated text line.

@wolfassi123
Copy link

@Shreeshrii I am always getting this error when I'm trying to train for Arabic. I am adding my own data in the "training_text" file and it consists of a lot of arabic numbers and dates.
I am constantly getting this issue. But I need to train the model into recognizing such dates and numbers, I'd rather not use different trained data, one for numbers and one for words.
Any idea how to solve such an issue?

@drdmitry
Copy link

drdmitry commented Jul 21, 2022

I had a similar issue ("Compute CTC targets failed!") when I generated two lstmf files from a different type of .box files.
One box file was generated with boxes as full-width horizontal lines of text.
Another box file was generated with boxes for each particular letter of the text.
I had to regenerate box files (train and eval) using the same type of --psm parameters, and after that, the training went smoothly.

@karan00713
Copy link

Hi i'm trying lstmtraining for tamil text, i'm facing compute ctc error
Compute CTC targets failed for /home/user/Aadhar/data/Aadhar-ground-truth/1.lstmf!
(=57 On [0, 2), scores= 1.12(:=59=1.11) 1.13(ஏ=64=1.12), Mean=1.12237, max=1.12645
ஏ=64 On [2, 4), scores= 1.13((=57=1.13) 1.13((=57=1.13), Mean=1.12687, max=1.1281
(=57 On [4, 6), scores= 1.13(ஏ=64=1.13) 1.14(ஏ=64=1.13), Mean=1.1351, max=1.13527
ஏ=64 On [6, 8), scores= 1.13((=57=1.14) 1.13((=57=1.14), Mean=1.12844, max=1.12863
(=57 On [8, 11), scores= 1.14(ஏ=64=1.13) 1.14(ஹ=66=1.13) 1.14(ஹ=66=1.13), Mean=1.13606, max=1.13633
Compute CTC targets failed for /home/user/Aadhar/data/Aadhar-ground-truth/2.lstmf!
(=57 On [0, 2), scores= 1.12(:=59=1.11) 1.13(ஏ=64=1.12), Mean=1.12268, max=1.12686
ஏ=64 On [2, 4), scores= 1.12((=57=1.13) 1.13((=57=1.13), Mean=1.12615, max=1.12734
(=57 On [4, 6), scores= 1.14(ஏ=64=1.13) 1.14(ஏ=64=1.13), Mean=1.13574, max=1.13588
ஏ=64 On [6, 8), scores= 1.13((=57=1.14) 1.13((=57=1.14), Mean=1.12798, max=1.12808
(=57 On [8, 10), scores= 1.14(ஏ=64=1.13) 1.14(ஏ=64=1.13), Mean=1.13546, max=1.13548
ஹ=66 On [10, 12), scores= 1.13((=57=1.14) 1.13((=57=1.14), Mean=1.12787, max=1.12802
(=57 On [12, 14), scores= 1.14(ஹ=66=1.13) 1.14(ஹ=66=1.13), Mean=1.13633, max=1.13645
Compute CTC targets failed for /home/user/Aadhar/data/Aadhar-ground-truth/3.lstmf!
(=57 On [0, 2), scores= 1.12(:=59=1.11) 1.13(ஏ=64=1.12), Mean=1.12269, max=1.12686
ஏ=64 On [2, 4), scores= 1.13((=57=1.13) 1.13((=57=1.13), Mean=1.12622, max=1.12741
(=57 On [4, 6), scores= 1.14(ஏ=64=1.13) 1.14(ஏ=64=1.13), Mean=1.13572, max=1.13585
ஏ=64 On [6, 8), scores= 1.13((=57=1.14) 1.13((=57=1.14), Mean=1.12804, max=1.12813
(=57 On [8, 10), scores= 1.14(ஏ=64=1.13) 1.14(ஏ=64=1.13), Mean=1.13534, max=1.13538
ஹ=66 On [10, 12), scores= 1.13((=57=1.14) 1.13((=57=1.14), Mean=1.12775, max=1.12785
(=57 On [12, 15), scores= 1.14(ஹ=66=1.13) 1.14(ஹ=66=1.13) 1.14(ஹ=66=1.13), Mean=1.13621, max=1.13649

@MikhailesU
Copy link

В обучении для создания файлов lstmf используются пары box/tiff. Если вы дадите неверный текст для изображения, то все обучение будет неправильным. Файлы вашего ящика также содержат только неверный текст.

that is, if I want to train the tesseract on text that it cannot see in the image, it will throw this error?

@stweil
Copy link
Member

stweil commented Sep 11, 2023

@karan00713, @kiberchert, recent software versions report the line image which caused the message. I suggest to visually inspect such images whether they are reasonable (not more than a single line, not rotated) and compare whether line image and line transcription match.

@DesBw
Copy link

DesBw commented Sep 19, 2023

This error occurs on my pc when text2image created empty tif files. The .lstmf files created out of those empty tif files trigger the error.
It looks like text2image has a lot of bugs. It created empty box files, as well as empty image files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants