Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--disable-openmp improves performance? #961

Closed
Shreeshrii opened this issue May 30, 2017 · 26 comments
Closed

--disable-openmp improves performance? #961

Shreeshrii opened this issue May 30, 2017 · 26 comments
Labels

Comments

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented May 30, 2017

Using the latest code from master branch seems to be slower than from commits a few days back. Training seems to hang at times. I am trying to get an objective measure by running sample training against different builds and will post results here.

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented May 30, 2017

training/tesstrain.sh \
  --fonts_dir  /mnt/c/Windows/Fonts \
  --tessdata_dir ./tessdata \
  --training_text ../langdata/eng/eng.training_text \
  --langdata_dir ../langdata \
  --lang eng  \
  --linedata_only \
  --noextract_font_properties \
  --exposures "0"    \
  --fontlist "Arial" \
  --output_dir ~/tesstutorial/engtest
  
training/tesstrain.sh \
  --fonts_dir  /mnt/c/Windows/Fonts \
  --tessdata_dir ./tessdata \
  --training_text ../langdata/eng/eng.training_text \
  --langdata_dir ../langdata \
  --lang eng  \
  --linedata_only \
  --noextract_font_properties \
  --exposures "0"    \
  --fontlist "Arial" \
  "Courier New" \
  --output_dir ~/tesstutorial/engeval

rm -rf  ~/tesstutorial/engtuned_from_engtest 

mkdir -p ~/tesstutorial/engtuned_from_engtest 

combine_tessdata -e ../tessdata/eng.traineddata \
  ~/tesstutorial/engtuned_from_engtest/eng.lstm

engtest was built with --fontlist "Arial" .
engeval was built with --fontlist "Arial" "Courier New" .

command on non-debug build, under WSL on Windows 10
using unchanged training_text and other files from langdata repo.

time lstmtraining \
  --continue_from ~/tesstutorial/engtuned_from_engtest/eng.lstm \
  --train_listfile ~/tesstutorial/engtest/eng.training_files.txt \
  --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \
  --model_output ~/tesstutorial/engtuned_from_engtest/engtuned \
  --debug_interval 0 \
 --max_image_MB 1000 \
 --perfect_sample_delay 19 \
 --max_iterations 1000

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented May 30, 2017

on
x86_64 x86_64 x86_64 GNU/Linux (WSL)
on Windows 10 Home
AMD A4-5000 APU @1.50 GHz
4 GB RAM

as of 42066ce


2 Percent improvement time=1, best error was 100 @ 0
At iteration 1/900/900, Mean rms=0.153%, delta=0%, char train=0.01%, word train=0.036%, skip ratio=0%,  New best char error = 0.01Deserialize failed w
rote best model:/home/shree/tesstutorial/engtuned_from_engtest/engtuned0.01_1.lstm wrote checkpoint.

Finished! Error rate = 0.01

real    64m31.894s
user    185m23.578s
sys     0m41.766s

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented May 30, 2017

on
x86_64 x86_64 x86_64 GNU/Linux (WSL)
on Windows 10 Home
AMD A4-5000 APU @1.50 GHz
4 GB RAM

as of 5a06417

2 Percent improvement time=1, best error was 100 @ 0
At iteration 1/900/900, Mean rms=0.153%, delta=0%, char train=0.01%, word train=0.036%, skip ratio=0%,  New best char error = 0.01Deserialize failed wrote best model:/home/shree/tesstutorial/engtuned_from_engtest/engtuned0.01_1.lstm wrote checkpoint.

Finished! Error rate = 0.01

real    51m58.330s
user    148m10.469s
sys     0m43.609s

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented May 30, 2017

on
x86_64 x86_64 x86_64 GNU/Linux (WSL)
on Windows 10 Home
AMD A4-5000 APU @1.50 GHz
4 GB RAM

as of b86b4fa


2 Percent improvement time=1, best error was 100 @ 0
At iteration 1/900/900, Mean rms=0.153%, delta=0%, char train=0.01%, word train=0.036%, skip ratio=0%,  New best char error = 0.01Deserialize failed wrote best model:/ho
me/shree/tesstutorial/engtuned_from_engtest/engtuned0.01_1.lstm wrote checkpoint.

Finished! Error rate = 0.01

real    58m48.689s
user    157m45.719s
sys     0m51.891s

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented May 30, 2017

If it does not add too much overhead, I would suggest adding some kind of regression testing as part of continuous integration.

Also, will be helpful for others to test to confirm, since I ran this under WSL on windows 10.

@amitdo
Copy link
Collaborator

amitdo commented May 30, 2017

According to your report the problem started with commit 6dd871b

The commits after that commit are just documentation fixes.

@amitdo
Copy link
Collaborator

amitdo commented May 30, 2017

Shree, did you run those tests with WSL?

@Shreeshrii
Copy link
Collaborator Author

Shree, did you run those tests with WSL?

Yes.

@Shreeshrii
Copy link
Collaborator Author

I did not set anything special for OPENMP though my PC probably supports it.

@stweil
Copy link
Member

stweil commented May 30, 2017

Same test on Linux with git master, debug build:

2 Percent improvement time=0, best error was 100 @ 0
At iteration 0/500/500, Mean rms=0.162%, delta=0%, char train=0.009%, word train=0.033%, skip ratio=0%,  New best char error = 0.009Deserialize failed wrote best model:/home/stweil/tesstutorial/engtuned_from_engtest/engtuned0.009_0.lstm wrote checkpoint.

Finished! Error rate = 0.009

real	2m7,797s
user	7m10,384s
sys	0m2,156s

Test machine: Intel(R) Xeon(R) CPU E3-1240 v5 @ 3.50GHz.
I had to fix the assertion issue first.

@stweil
Copy link
Member

stweil commented May 30, 2017

I currently think that the timing regression is related to bug #644. @Shreeshrii, could you please try git master with the patch shown there?

@Shreeshrii
Copy link
Collaborator Author

on
Ubuntu 14.04.5 LTS (GNU/Linux 4.4.0-75-generic x86_64)
Intel(R) Pentium(R) Dual CPU E2220 @ 2.40GHz
4 GB RAM,

with git master
tesseract 4.00.00alpha-538-g42066ce-1995

2 Percent improvement time=1, best error was 100 @ 0
At iteration 1/800/800, Mean rms=0.157%, delta=0%, char train=0.009%, word train=0.033%, skip ratio=0%,  New best char error = 0.009Deserialize failed wrote best model:/home/shree/tesstutorial/engtuned_from_engtest/engtuned0.009_1.lstm wrote checkpoint.

Finished! Error rate = 0.009

real    35m51.291s
user    58m50.184s
sys     4m23.416s

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented May 31, 2017

on
Ubuntu 14.04.5 LTS (GNU/Linux 4.4.0-75-generic x86_64)
Intel(R) Pentium(R) Dual CPU E2220 @ 2.40GHz
4 GB RAM,

reset hard to 5a06417

shree@sanskrit:~/tesseract$ tesseract -v
tesseract 4.00.00alpha-533-g5a06417-1990
leptonica-1.74.2

2 Percent improvement time=1, best error was 100 @ 0
At iteration 1/800/800, Mean rms=0.157%, delta=0%, char train=0.009%, word train=0.033%, skip ratio=0%,  New best char error = 0.009Deserialize failed wrote best model:/home/shree/tesstutorial/engtuned_from_engtest/engtuned0.009_1.lstm wrote checkpoint.

Finished! Error rate = 0.009

real    35m41.198s
user    58m38.460s
sys     4m9.208s

@Shreeshrii
Copy link
Collaborator Author

Closing as problem not reproducable on other systems.

@Shreeshrii
Copy link
Collaborator Author

FYI

I built with --disable-openmp under WSL. Now it is taking less time.


2 Percent improvement time=1, best error was 100 @ 0
At iteration 1/900/900, Mean rms=0.153%, delta=0%, char train=0.01%, word train=0.036%, skip ratio=0%,  New best char error = 0.01Deserialize failed wrote best model:/ho
me/shree/tesstutorial/engtuned_from_engtest/engtuned0.01_1.lstm wrote checkpoint.

Finished! Error rate = 0.01

real    65m34.655s
user    63m39.250s
sys     0m7.453s

@amitdo
Copy link
Collaborator

amitdo commented Jun 1, 2017

@Shreeshrii Shreeshrii reopened this Jun 1, 2017
@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented Jun 1, 2017

I am reopening the issue, because there is some impact on lower level cpus which support openmp.

By disabling openmp, the user time has become almost one third on WSL on my AMD Windows 10 PC.

from

real    64m31.894s
user    185m23.578s
sys     0m41.766s

to

real    65m34.655s
user    63m39.250s
sys     0m7.453s

User+Sys will tell you how much actual CPU time your process used.

Maybe, need to add some recommendation in wiki to --disable-openmp

@amitdo
Copy link
Collaborator

amitdo commented Jun 1, 2017

The fact that the 'user' time is a few times longer than the 'real' time is expected with multi core CPU and OpenMP enabled.

However, the fact that the real time with or without OpenMP is roughly the same is not expected.

@Shreeshrii Shreeshrii changed the title Regression after recent commits --disable-openmp improves performance? Jun 1, 2017
@stweil
Copy link
Member

stweil commented Jun 1, 2017

See also my comment in the RFC on performance. Even on Linux, OpenMP requires a large percentage of additional CPU time. That's why Tesseract with OpenMP is not much faster in the "real" time, but uses much more "user" and "sys" time. On Windows the situation gets worse because thread switching for OpenMP seems to be slower than on Linux.

@Shreeshrii
Copy link
Collaborator Author

I think it would he helpful to mention on the 'Compiling' pages to build with --disable-openmp.

@stweil IS it ok to do so? Are there any particular conditions under which it should be recommended?

@Shreeshrii
Copy link
Collaborator Author

on
Ubuntu 14.04.5 LTS (GNU/Linux 4.4.0-75-generic x86_64)
Intel(R) Pentium(R) Dual CPU E2220 @ 2.40GHz
4 GB RAM,

lstmtraining for tesseract training
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
15229 shree     20   0 1152728 468280   6720 R  98.0 11.6 788:35.04 lstmtraining

It has been taking 92-98% CPU.

@stweil Any recommendation on how to reduce cpu usage?
I haven't tried with disable-openmp on that machine.

@stweil
Copy link
Member

stweil commented Jun 11, 2017

A dual core CPU will produce a very high overhead when OpenMP with four or more threads is used, so yes, disabling OpenMP might help. The Tesseract code could also be improved with OpenMP enabled: it could set the number of threads at run time, either based on user's choice (command line parameter) or based on the number of available CPU cores.

@Shreeshrii
Copy link
Collaborator Author

--disable-openmp improved performance of tesseract in recognizing images. But lstmtraining still taking 99% cpu.

@Brian51
Copy link

Brian51 commented Jun 12, 2017

Set a higher "nice factor" for Tesseract, in this situation?

@Shreeshrii
Copy link
Collaborator Author

I use that machine remotely.

I am now running the command as nice lstmtraining - so it should be using the default nice value of 10. It still shows cpu usage upto 99-100%, but if another process eg. firefox is used locally on that machine, I am hoping that it will get priority and not degrade the browsing performance.

Is there a valgrind or other debugging command which will help isolate what is causing this high cpu usage?

@stweil
Copy link
Member

stweil commented Jun 12, 2017

It's normal that programs like lstmtraining which mostly do computation use all of the CPU which they can get, so 99 % is good (as long as it is spent for training, not for busy waits during thread scheduling :-)). And running it as nice lstmtraining is a good idea if the same computer is also used by other processes / users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants