-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
different results when same image is lossless-encoded at different bpp #242
Comments
https://github.com/tesseract-ocr/tesseract/blob/master/ccmain/thresholder.cpp#L54
|
Mmm yes... Well two approaches from my point of view:
or
I really think something should be done, because people will just (wrongly) conclude that tesseract is bad at reading a really clear-looking picture... ( I was very close to that conclusion but I was saved by the fact that previous trials with much lower quality pictures (vobsub output of ProjectX) had given the correct result for that precise subtitle. But other errors in about 10% of subtitles.) |
I totally agree... |
Oops I made a mistake. The relevant method takes 'const Pix*'. My previous comment linked to an overloaded method which takes 'const unsigned char*'.
|
I converted your nak.png to rgb and gray images using GIMP.
|
Oh, I missed this part: |
https://github.com/tesseract-ocr/tesseract/blob/master/api/tesseractmain.cpp#L97 |
Well, I did some digging in the image path before character recognition. I think it all comes from the dark-blue background. If the image is converted to gray-levels before thresholding (16 color palette image), the threshold is determined via gray-histogram study, and is set to 98 |
I'm not sure if it is a good idea to set a priority in the RGB thresholds so that red is tested before green, and then only blue. This priority seems only chosen by the classical RGB color order:
In fact the algorithm even calculates the alpha channel histogram, and if the result was not null, it would test the alpha value of the pixel against this threshold. Anyway, as for now, tesseract seems to do arbitrary choices in the process of color-images binarization:
Maybe the sensible advice to users experiencing bad results with color images should be to do the gray conversion by themselves. There are many possible choices in this process: GIMP has tree options in the 'desaturate' dialog; my old scanner was just using the green channel when scanning in grey (as do many copiers). Maybe one of CMYK layers could be good in some cases... In the case of these subtitles where, who knows why, ProjectX added a dark-blue background, selecting the red or green plane gives the best contrast between text and background, and leads to good results (which happens by chance on 'ok.png', would'nt work if ProjectX had chosen red background) |
Nice analysis! https://github.com/tesseract-ocr/tesseract/blob/master/README.md
So maybe the ImproveQuality wiki page is the right place to add your advice. You can edit the wiki yourself. The explanation should be as clear and short as possible. |
The 0.25 R + 0.5 G + 0.25 B calculation designed to give fast results, not accurate ones. Sounds like someone may be calling pixConvertRGBToGrayFast() instead of pixConvertRGBToGray(). This seems like a bad idea if it is affecting results. And I also don't understand why we would ever threshold on R,G,B individually. The normal thing to do is calculate luminance, then threshold on luminance. Leptonica is really, really good at image binarization. We should be making use of it. |
@bruvi where exactly is the 0.25 R + 0.5 G + 0.25 B conversion happening? |
It is in fact in leptonica.
For indexed-color images, pixConvertTo8 calls pixRemoveColormap (see [https://tpgit.github.io/Leptonica/pixconv_8c_source.html] ) with
So it is leptonica which uses this quick formula, although speed should not really be a concern here as it is applied just to the colormap, not the full image. Sorry, I don't have much time this week to follow this thread. |
If we use sRGB perceptual weightings then it works.
If we use "Leptonica" perceptual weightings, then it fails.
If I remove the colormap before anything else, then it works. I think this is the right solution because it guarantees consistency (e.g. colormap vs. non-colormap will not change results if image is otherwise identical). Leptonica should switch to some sort of perceptual weighting for the next release because that is a no-brainer, but will not have any effect on Tesseract if we remove the colormap first. --- tesseract/ccmain/thresholder.cpp 2014-07-11 11:28:02.000000000 -0700
+++ tesseract/ccmain/thresholder.cpp 2016-03-11 10:01:36.000000000 -0800
@@ -149,17 +149,23 @@
if (pix_ != NULL)
pixDestroy(&pix_);
Pix* src = const_cast<Pix*>(pix);
- int depth;
- pixGetDimensions(src, &image_width_, &image_height_, &depth);
// Convert the image as necessary so it is one of binary, plain RGB, or
// 8 bit with no colormap.
+ Pix *tmp;
+ if (pixGetColormap(src)) {
+ tmp = pixRemoveColormap(src, REMOVE_CMAP_BASED_ON_SRC);
+ } else {
+ tmp = pixClone(src);
+ }
+ int depth;
+ pixGetDimensions(tmp, &image_width_, &image_height_, &depth);
+
if (depth > 1 && depth < 8) {
- pix_ = pixConvertTo8(src, false);
- } else if (pixGetColormap(src)) {
- pix_ = pixRemoveColormap(src, REMOVE_CMAP_BASED_ON_SRC);
+ pix_ = pixConvertTo8(tmp, false);
} else {
- pix_ = pixClone(src);
+ pix_ = pixClone(tmp);
}
+ pixDestroy(&tmp);
depth = pixGetDepth(pix_);
pix_channels_ = depth / 8;
pix_wpl_ = pixGetWpl(pix_);
|
@jbreiden: will you make a PR or should I commit patch from above? |
Ray is going to write the official change. Best to coordinate with him. He is doing the same thing, but with a different way of writing the code. |
patch committed with c1c1e42 |
…same image is lossless-encoded at different bpp
…same image is lossless-encoded at different bpp
…same image is lossless-encoded at different bpp
…same image is lossless-encoded at different bpp
Hi,
I noticed different results when analyzing the same PNG image having 8 different colors when it is encoded with a colormap of 8 or 256 colors.
Version:
Demo:
french subtitle conversion from XFiles-S01E01 (DVB). The subtitle image extracted with ProjectX as a BMP has 8 distinct colors. It was converted to PNG with two different programs.
'ok.png', 256 entries colormap:
Tesseract result is correct:
'nak.png' 8 entries colormap:
Tesseract result is wrong:
The image type is either recognized as 4 or 8 bpp but the information content is identical:
I suspect that tesseract internal processing keeps the original image bit depth and that some steps don't work as well at 4bpp as at 8bpp.
External workaround:
Quick patch to do this conversion internally in libtesseract:
I am not sure if this is the correct strategy to deal with this issue, nor if it is the right place to change the data type. 4-color DVB subtitles exist, so 2bpp images should probably also be considered.
Thanks for this soft, works great for me.
-- Bruno
The text was updated successfully, but these errors were encountered: