-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using SetImage(buffer, width, height, ...) result worse than when using SetImage(Pix*) #2080
Comments
These two messages are confusing. 70 or 287? |
Well, both messages are printed by Tesseract, so how should I answer that? |
My comment was targeted to other tesseract devs. |
@tailsu : thanks for report. |
There are 2 checks for credible DPI (>kMinCredibleResolution): Line 2353 in 6d06d39
and (>kMinCredibleResolution and <kMaxCredibleResolution): tesseract/src/ccmain/osdetect.cpp Line 169 in 6d06d39
In both cases: if credible resolution is not found, resolution is set to kMinCredibleResolution (70). Additionally if resolution is equal to kMinCredibleResolution then there (SetupPageSegAndDetectOrientation) is algorithm (line_size * 10) for better estimation of DPI: tesseract/src/ccmain/pagesegmain.cpp Line 313 in 6d06d39
|
Environment
leptonica-1.76.0
libjpeg 9c : libpng 1.6.35 : libtiff 4.0.9 : zlib 1.2.11
Found AVX512BW
Found AVX512F
Found AVX2
Found AVX
Found SSE
Current Behavior:
Behavior reproduced with the following image:
API initialized with "eng+deu" languages with language data from the
tessdata_fast
repository, mode is PSM_AUTO, everything else is default.Text recognition is very accurate when using the following API:
In particular, the words "Geschäftsführer", "Gesellschaft", "Registergericht" and "Charlottenburg" are recognized completely and accurately.
I also get the following message in stderr:
If I use the OpenCV imgcodecs API to load the image (it shouldn't matter how the image was read), and use the SetImage(buffer) overload:
Then the quality is much worse. All of the above-mentioned words are completely absent from the result.
Expected Behavior:
There should be zero difference in behavior when using either SetImage() overload when given the exact same source data.
Suggested Fix:
I've tracked the problem to this line:
tesseract/src/ccmain/thresholder.cpp
Line 115 in dba7f45
If I add the same line to my original working example:
then I get the same crappy result. I also don't get the messages in stderr about the resolution getting estimated.
Suggested fix is to remove that line.
The text was updated successfully, but these errors were encountered: