-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable OpenMP #259
Comments
Yes, I get similar results: OpenMP uses a lot of resources and increases the time required for training. Note that training currently requires up to 4 threads even when Tesseract was built without OpenMP:
|
Because nobody changed that? I wonder whether we should remove all build instructions from the |
cmake build disabled openmp by default for a long time... |
Interesting! So that's why I get occasional 200% CPU utilization.
I see. Yes, probably better to have all that in Tesseract's makefile only. I'm not autotools afluent, so how do we change this?
Indeed, git annotate says this was from 2yrs ago, so I guess the respective statement in the release notes |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I updated the 4.1.0 release notes.
|
The Autotools build in release 5.0.0 still enabled OpenMP by default. Should that be changed? Is this a "bug fix" which can be done in release 5.0.1 (and maybe also in release 4.1.4)? |
By default tesstrain builds vanilla tesseract / lstmtraining, which IINM links against OpenMP.
I know @stweil argued repeatedly for disabling OpenMP for prediction in the mass production / batch scenario, e.g. here.
However, the case for training seems to be a different one: normally we want to get a single model on a lot of data as fast as possible.
In this comment @theraysmith presented measurements showing 3.5x speedup with 4 threads.
However, I cannot reproduce that. On the contrary: On a VM with (4 cores of) Intel Xeon Gold 5218 CPU @ 2.30GHz and a finetuning job with 500 lines, I see an 8x increase, despite using more than 300% CPU instead of just 100% when single-threaded. (Yes, I do get similar results in both cases.) That's a 24x worse utilization of operational resources!
(I have repeated that experiment on a finetuning job with 1200 lines, where multithreaded takes 2x as long.)
Also, there seems to be a significant difference between lstmtraining built with
--disable-openmp
on the one side and lstmtraining built with OpenMP, but run withOMP_THREAD_LIMIT=1
: the latter is even worse than OpenMP running unconstrained. Also, it still takes more than one core (alternating between 100% CPU utilization mostly and 200% intermittently).Can anyone confirm this? Has something changed significantly in Tesseract's threaded code base since Ray's time, or is this simply due to my virtualization environment?
Moreover, @stweil has already pointed out OpenMP prevents reproducible models. For all the effort that has already gone into the latter (replacing
shuf
etc, sorting files), why is OpenMP still in then?Why don't we build with
--disable-openmp
by default, or at least setOMP_THREAD_LIMIT=1
in the lstmtraining recipes?The text was updated successfully, but these errors were encountered: