Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault at at ../ccstruct/matrix.h:256 #881

Closed
Shreeshrii opened this issue May 5, 2017 · 11 comments
Closed

segfault at at ../ccstruct/matrix.h:256 #881

Shreeshrii opened this issue May 5, 2017 · 11 comments

Comments

@Shreeshrii
Copy link
Collaborator

gdb --args lstmtraining -U ~/tess4training/eng/eng.unicharset \
>   --script_dir ../langdata   \
>   --append_index 5 --net_spec '[Lfx256 O1c105]' \
>   --continue_from ~/tess4training/englayer_from_eng/eng.lstm \
>   --model_output ~/tess4training/englayer_from_eng/englayer \
>   --train_listfile ~/tess4training/eng/eng.training_files.txt \
>   --target_error_rate 0.01 \
>   --debug_interval  -1
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from lstmtraining...done.
(gdb) run
Starting program: /usr/local/bin/lstmtraining -U /home/shree/tess4training/eng/eng.unicharset --script_dir ../langdata --append_index 5 --net_spec \[Lfx256\ O1c105\] --c
ontinue_from /home/shree/tess4training/englayer_from_eng/eng.lstm --model_output /home/shree/tess4training/englayer_from_eng/englayer --train_listfile /home/shree/tess4t
raining/eng/eng.training_files.txt --target_error_rate 0.01 --debug_interval -1
warning: Error disabling address space randomization: Success
warning: linux_ptrace_test_ret_to_nx: PTRACE_KILL waitpid returned -1: Interrupted system call
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Loaded file /home/shree/tess4training/englayer_from_eng/eng.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Continuing from /home/shree/tess4training/englayer_from_eng/eng.lstm
Other case É of é is not in unicharset
Appending a new network to an old one!!Setting unichar properties
Setting properties for script Common
Setting properties for script Latin
Num outputs,weights in serial:
  Lfx256:256, 394240
  Fc105:105, 26985
Total weights = 421225
Built network:[1,0,0,1[C5,5Ft16]Mp3,3Lfys64Lfx128Lrx128Lfx256Fc105] from request [Lfx256 O1c105]
Training parameters:
  Debug interval = -1, weights = 0.1, learning rate = 0.0001, momentum=0.9
[New Thread 0x7fea20f40700 (LWP 1158)]
Loaded 104/104 pages (1-104) of document /home/shree/tess4training/eng/eng.Arial.exp0.lstmf
[Thread 0x7fea20f40700 (LWP 1158) exited]
[New Thread 0x7fea1bff0700 (LWP 1159)]
Loaded 104/104 pages (1-104) of document /home/shree/tess4training/eng/eng.Times_New_Roman.exp0.lstmf
[Thread 0x7fea1bff0700 (LWP 1159) exited]
[New Thread 0x7fea1bff0700 (LWP 1160)]
[New Thread 0x7fea20f40700 (LWP 1161)]
[New Thread 0x7fea1acf0700 (LWP 1162)]
Iteration 0: ALIGNED TRUTH : used through € between NEW % J. should when High when We it
Iteration 0: BEST OCR TEXT : St$Zv#kv#qSZ]$vUZUyvHvkyEHkSZUZtvyvyvHvq$Rwq,qtHgE#gqZUSUZ#EqgqSqSqyvHvt$q£$R$RqtS$U,#,v$USUHZq,c$U$qSq,$#vyvHvSHZSZ=Z$[ [vESUSZUyvHvSqSqyHZ
vt,q€U€vlvyvySyHvqtSUZtyHvSqSqvqHq,U$U,UegqgqveZS
File /tmp/tmp.E8R3inud3B/eng/eng.Arial.exp0.lstmf page 6 :

Program received signal SIGSEGV, Segmentation fault.
operator+= (addend=..., this=0xb92048) at ../ccstruct/matrix.h:256
256               (*this)(x, y) += addend(x, y);
(gdb) backtrace
#0  operator+= (addend=..., this=0xb92048) at ../ccstruct/matrix.h:256
#1  tesseract::WeightMatrix::Update (this=0xb92048, learning_rate=<optimized out>, momentum=0.89999997615814209, num_samples=<optimized out>) at weightmatrix.cpp:261
#2  0x00007fea254b7083 in tesseract::Plumbing::Update (this=0xb91e80, learning_rate=1.24999988e-05, momentum=0.899999976, num_samples=1) at plumbing.cpp:219
#3  0x00007fea254b7083 in tesseract::Plumbing::Update (this=0xb91230, learning_rate=9.99999975e-05, momentum=0.899999976, num_samples=1) at plumbing.cpp:219
#4  0x00007fea254a8469 in tesseract::LSTMTrainer::TrainOnLine (this=this@entry=0x7ffff03ebaf0, trainingdata=<optimized out>, batch=batch@entry=false)
    at lstmtrainer.cpp:813
#5  0x0000000000406ff4 in TrainOnLine (batch=false, samples_trainer=0x7ffff03ebaf0, this=0x7ffff03ebaf0) at ../lstm/lstmtrainer.h:273
#6  main (argc=1, argv=0x7ffff03ec488) at lstmtraining.cpp:198
(gdb) up
#1  tesseract::WeightMatrix::Update (this=0xb92048, learning_rate=<optimized out>, momentum=0.89999997615814209, num_samples=<optimized out>) at weightmatrix.cpp:261
261       if (momentum > 0.0) wf_ += updates_;
(gdb) print num_samples
$1 = <optimized out>
(gdb) print momentum
$2 = 0.89999997615814209
(gdb) print wf_
$3 = {_vptr.GENERIC_2D_ARRAY = 0x7fea257bf230 <vtable for GENERIC_2D_ARRAY<double>+16>, array_ = 0xb92230, empty_ = 0, dim1_ = 16, dim2_ = 26, size_allocated_ = 416}
(gdb) print updates_
$4 = {_vptr.GENERIC_2D_ARRAY = 0x7fea257bf230 <vtable for GENERIC_2D_ARRAY<double>+16>, array_ = 0x0, empty_ = 0, dim1_ = 0, dim2_ = 0, size_allocated_ = 0}
(gdb) quit

@stweil
Copy link
Member

stweil commented May 5, 2017

How can I reproduce this crash? updates_ looks strange. No wonder that the code crashes when it tries to add a 0 x 0 matrix.

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented May 5, 2017

This was the with the latest code from github, including Ray's latest commit regarding endian support.

You will need to change the fonts dir in the commands below.

rm -rf ~/tess4training/eng  
  
training/tesstrain.sh \
   --fonts_dir ~/.fonts \
   --lang eng  \
   --noextract_font_properties \
   --linedata_only \
   --exposures "0" \
   --langdata_dir ../langdata \
   --tessdata_dir ../tessdata \
   --fontlist "Arial"  "Times New Roman,"  \
   --output_dir ~/tess4training/eng
  
rm -rf ~/tess4training/englayer_from_eng    

mkdir -p ~/tess4training/englayer_from_eng 

combine_tessdata -e ../tessdata/eng.traineddata \
  ~/tess4training/englayer_from_eng/eng.lstm

lstmtraining -U ~/tess4training/eng/eng.unicharset \
  --script_dir ../langdata   \
  --append_index 5 --net_spec '[Lfx256 O1c105]' \
  --continue_from ~/tess4training/englayer_from_eng/eng.lstm \
  --model_output ~/tess4training/englayer_from_eng/englayer \
  --train_listfile ~/tess4training/eng/eng.training_files.txt \
  --target_error_rate 0.01 \
  --debug_interval  -1

@amitdo
Copy link
Collaborator

amitdo commented May 5, 2017

Ray said that he needs to fix training after his last commits.

@stweil
Copy link
Member

stweil commented May 5, 2017

When I run all but the last step with older executables, I get an error from lstmtraining:

Deserialize failed: tess4training/eng/eng.Arial.exp0.lstmf read 0/72 pages

That might confirm that the last commits introduced some incompatibility.

@stweil
Copy link
Member

stweil commented May 5, 2017

I could confirm that commit 8e79297 introduced a regression. Before that commit, lstmtraining did run the iterations.

@amitdo
Copy link
Collaborator

amitdo commented May 5, 2017

83588bc#commitcomment-22019959

Now back to finding out why training isn't working properly...

@Shreeshrii
Copy link
Collaborator Author

Thanks, Amit. I had seen that comment but thought that it was related to training of new language models.

@stweil
Copy link
Member

stweil commented May 5, 2017

Here language_ is added to the serialized data, so old and new data are incompatible (this is the reason why I get the above error message). The new data is correctly handled in ImageData::DeSerialize and in ImageData::SkipDeSerialize.

Does the training process use old files which contain old incompatible ImageData?

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented May 5, 2017

I am not sure what is included in ImageData.

The first part of the training commands (tesstrain.sh) runs text2image and creates box/tiff pairs and these are then used to create unicharset, dawg and lstmf files. I have not used any external box/tiff pairs.

These lstmf files are used along with the extracted lstm file from the traineddata from the repo.

combine_tessdata -e ../tessdata/eng.traineddata ~/tess4training/englayer_from_eng/eng.lstm

Possibly, this could be the cause of incompatibility, since these files were built with earlier code.

@Shreeshrii
Copy link
Collaborator Author

the lstmtraining command for this issue is for replacing a top layer.

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented May 6, 2017

lstmtraining command for replacing a top layer is working now after the changes by Ray.

 lstmtraining -U ~/tess4training/eng/eng.unicharset \
>   --script_dir ../langdata   \
>   --append_index 5 --net_spec '[Lfx256 O1c105]' \
>   --continue_from ~/tess4training/englayer_from_eng/eng.lstm \
 >   --model_output ~/tess4training/englayer_from_eng/englayer \
>   --train_listfile ~/tess4training/eng/eng.training_files.txt \
>   --target_error_rate 0.01 \
>   --debug_interval  -1
Loaded file /home/shree/tess4training/englayer_from_eng/eng.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Continuing from /home/shree/tess4training/englayer_from_eng/eng.lstm
Other case É of é is not in unicharset
Appending a new network to an old one!!Setting unichar properties
Setting properties for script Common
Setting properties for script Latin
Num outputs,weights in serial:
  Lfx256:256, 394240
  Fc105:105, 26985
Total weights = 421225
Built network:[1,0,0,1[C5,5Ft16]Mp3,3Lfys64Lfx128Lrx128Lfx256Fc105] from request [Lfx256 O1c105]
Training parameters:
  Debug interval = -1, weights = 0.1, learning rate = 0.0001, momentum=0.9
Loaded 104/104 pages (1-104) of document /home/shree/tess4training/eng/eng.Arial.exp0.lstmf
Loaded 104/104 pages (1-104) of document /home/shree/tess4training/eng/eng.Times_New_Roman.exp0.lstmf
Iteration 0: ALIGNED TRUTH : (GeneRIF) World and ABOUT time October the service through is Help
Iteration 0: BEST OCR TEXT : #yvtEktgqtHv$gqy$g$ScqESvq,UgU,UZHZtZ€U Zq,yqaqtqHt SUctéHS$Ht:HtS$Uu5,q,UvUtHSqgqtqH\t$gZUHZtZHZqZqZqEZUyHEqgq€EgtqZtZv#
v\EyZqgqEUyHvHEHZHSZ$ZtvyHZvUv#qtq€U€e#qevZH
File /home/shree/tess4training/eng/eng.Arial.exp0.lstmf page 0 :
Mean rms=8.666%, delta=100%, train=301.515%(100%), skip ratio=0%
Iteration 1: ALIGNED TRUTH : through they not this our English ° 14 ® Development of 8 What's an
Iteration 1: BEST OCR TEXT : kZyHZkEZkSZ$ZtqtyHZq]UtyHESqS#tHvZ:ZSZqUZyHZvk#t4SZSZqZqyq4,q,eHvZvtv€yveS#tyHZq$qSq4$SU,Uq#R#$SHtU,$,H\Sq7Zt#q¥SHtvtktaH
vHSZtHZSZqS:SZSv#$S,tU$U5UH\HvZSZUHvqEyqHZH
File /home/shree/tess4training/eng/eng.Times_New_Roman.exp0.lstmf page 0 :
Mean rms=8.647%, delta=100%, train=303.743%(100%), skip ratio=0%
Iteration 2: ALIGNED TRUTH : good 25 now this [J User INDEX had help 6 all iGoogle if in Ca¥ years
Iteration 2: BEST OCR TEXT : StvEHE#Zt]$v4#kqUklvtEHEHZSUZ#tEZUyHvZv#vq,uc$,4[$[e#v#ZqZqyq$Sq$u$H\R$q$w$,tyHv#qHSc ZUtHvHZgqZvZHq]UlvE4qZ[$vUvUv$vHg\E
H\tv\HZqgq4SvUvevHqt$,SqHS$,q,S#taqSqHqH$S
File /home/shree/tess4training/eng/eng.Arial.exp0.lstmf page 1 :
Mean rms=8.648%, delta=100%, train=301.046%(100%), skip ratio=0%
Iteration 3: ALIGNED TRUTH : were jobs BBC get is" University A 1998 fixed Search And under .
Iteration 3: BEST OCR TEXT : ,SkZ]EoZqyES$qEvyvtHZRHZHkvqt$u$vt/$He}$,$vt,tvytZqZHqUvkv#$q]oq4U$tetHvHvZtZtqZtZHv#veZSyS#tq#}éHSqU [$Uk$SCpH.$vqUyHv$H
#qS€ Sv,$évEqg€qZqyt$tyHSqU$téHtHZHc SU,SvZtvHc]$eSZqZqy{q,S
File /tmp/tmp.2xbvLE29x7/eng/eng.Times_New_Roman.exp0.lstmf page 25 :
Mean rms=8.696%, delta=100%, train=306.644%(100%), skip ratio=0%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants