Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libtesseract on MSVC 2019 32 bit requires compiler optimization switched off #3283

Closed
BJungmann opened this issue Feb 5, 2021 · 21 comments
Closed

Comments

@BJungmann
Copy link

Environment

  • Tesseract Version: 5.0.0-alpha-20201231
  • Platform: W10 64-bit
  • compiler MSVC 2019, configuration Win32, optimization flag /O2

Current Behavior:

If I make tesseract50.lib with above settings and link it into my own Qt Creator project, recognition results are very poor. They do have some relation to the source image, but are unusable.
If I make tesseract50.lib with optimization flag /Od, deactivated, inline functions may still have /Ob2, the recognition results are OK.

Expected Behavior:

This behaviour should not depend on the optimization flags.

This problem does not show up if I compile everything for x64 (just now succeeded in trying that).

@amitdo
Copy link
Collaborator

amitdo commented Feb 5, 2021

Does this issue occur with Tesseract command line program?

@egorpugin
Copy link
Contributor

What about 64-bit build?

@BJungmann
Copy link
Author

64-bit build is OK, I said it already.

Currently I have not test environment for the command line program. But I guess the problem could be related to borders of rectangles. See an example of a rectangle for TessBaseAPI::SetRectangle(), and the returned ocrx_word rectangles:
Issue3283_1
Issue3283_2

@egorpugin
Copy link
Contributor

Yes, indeed. Sorry.


@stweil @zdenop
Do we want to support 32-bit builds on windows at all?

@zdenop
Copy link
Contributor

zdenop commented Feb 6, 2021

If there is no one from contributors able and willing to maintain 32bit build on windows, I would suggest to marked it as unsupported or experimental.

Anyway I would suggest to try minimal version of tesseract if problem is really in tesserac&leptonica part.
Other option is to try build tesserac&leptonica with clang which is able to build VS library...

@BJungmann
Copy link
Author

I have now reproduced the problem with Tesseract command line program. With /O2, the recognition result of the first attached image was
Njotker Balbulus/Martin jLuther
But to my surprise, this did happen only if I used -l deu, deu.traineddata from https://github.com/tesseract-ocr/tessdata.git/trunk. If I used -l eng, the result was
Notker Balbulus/Martin Luther

I found the same effect in my own app. So I tried to measure the time saving effect of the optimization flag, with eng.traineddata, for convenience in my own app, and it seems significant, more than a factor of 3.

I also compared my own app in x64 to x86, and foud x64 more than a factor of 5 faster than x86 without optimization. This seems to be due to the optimization flag to a large extent, and the real gain of x64 seems to be less than a factor of 2, which comes close to what I had expected.

@BJungmann
Copy link
Author

Today Microsoft surprised me with an update VisualStudio16.8.5 They said they had fixed a bug in x64 code optimization: https://developercommunity2.visualstudio.com/t/Bug-in-optimization-compiler-of-Visual-S/1224667. I tried this, but it made no difference. But they said "I confirm the codegen bug: it's existed since 16.5 and still reproduces with 16.8."
So I downloaded the older VS 2017. And with VS 2017, the recognition with /O2 is OK!!

This solves the issue for me.

@amitdo amitdo closed this as completed Feb 10, 2021
@amitdo
Copy link
Collaborator

amitdo commented Feb 10, 2021

I closed this issue because it seems to be an issue with VC++, not in tesseract.

If a future version of VS 2019 will solve the issue, let us know.

@amitdo
Copy link
Collaborator

amitdo commented Feb 10, 2021

I suggest to report about this issue to microsoft.

@BJungmann
Copy link
Author

Commenting the cited issue yielded an error, repeatedly. I have now submitted a new Visual Studio feedback item.

@amitdo
Copy link
Collaborator

amitdo commented Feb 10, 2021

Please add a link.

@amitdo
Copy link
Collaborator

amitdo commented Feb 10, 2021

@BJungmann
Copy link
Author

Thank you. I did not receive any information about this link after submitting.

@BJungmann
Copy link
Author

I could now identify the usage of AVX2 hardware in intsimdmatrix.avx2.cpp as a necessary condition to reproduce the problem. If I set avx2_available_ = false in tesseract::SIMDetect(), I get good recognition results with VS2019 optimized code (/O2). I've reported this to Microsoft.
The recent VS2019 update (Version 16.9.1) does not solve the problem.

@egorpugin
Copy link
Contributor

They have a lot of issues with optimizer in corner cases.

@amitdo amitdo added the msvc label Sep 14, 2021
@BJungmann
Copy link
Author

BJungmann commented Jan 13, 2022

Today I tried with the new update VS 2019 Version 16.11.9 and the tesseract version 5.0.1, tagged Jan 7th. I found that the described problem (Notker Balbulus/Martin Luther not being recognized well) is still reproducible. But for some other reason I selected configuration properties - general - platform toolset v141 (which belongs to VS 2017), recompiled libtesseract and the problem was gone. With platform toolset v142 the problem showed up again.

Little problem: I had to get rid of the std::min() and std::max() calls in ccmain\thresholder.cpp to make this file compile with v141. I did it by providing these macros (without std::) directly in the local file.

@stweil
Copy link
Member

stweil commented Jan 13, 2022

@BJungmann, please try whether adding #include <algorithm> fixes the build.

@BJungmann
Copy link
Author

Yes, it does :-)

I had already tried with #include <minwindef.h>, which was one suggestion of the IDE, but that produced new problems. So I fell back to the simple but unelegant solution.

@stweil
Copy link
Member

stweil commented Jan 13, 2022

Thanks. Fixed with commit ad55cec.

@stweil
Copy link
Member

stweil commented Mar 26, 2022

@BJungmann, maybe you can try the code patch which I suggested in #3769. It should be sufficient to change the optimization options for a single file.

@BJungmann
Copy link
Author

See my suggestion in #3769.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants