Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with png images #1914

Open
Dian8 opened this issue Sep 18, 2018 · 13 comments
Open

Problem with png images #1914

Dian8 opened this issue Sep 18, 2018 · 13 comments

Comments

@Dian8
Copy link

Dian8 commented Sep 18, 2018

Environment

  • Tesseract Version: tesseract-4.0.0-beta.4 (binary from github releases, dependencies from cppan)
  • Platform: Windows 10, x64

Sample code:

        Pix *image = pixReadMem(buffer, length);

        api->SetImage(image);

        //pixWrite("c:\\temp\\img.jpg", image, 2);
        //Pix *image2 = pixRead("c:\\temp\\img.jpg");
        //api->SetImage(image2);

        api->SetSourceResolution(70);

        // Get OCR result
        char* outText = api->GetUTF8Text();
        printf("OCR output:\n%s", outText);

Current Behavior:

If source image comes in jpg all works good. If image comes in png recognized text is empty.
If convert source png to jpg (commented code fragment above) text is recognized well

Expected Behavior:

Png images successfully recognized

@amitdo
Copy link
Collaborator

amitdo commented Sep 18, 2018

Try the command line and post the output.

Can you share the png image?

@zdenop
Copy link
Contributor

zdenop commented Sep 18, 2018

Can you also provide output of tesseract -v?

@Dian8
Copy link
Author

Dian8 commented Sep 18, 2018

tesseract.exe -v

tesseract 4.0.0-beta.3
leptonica-1.76.0 (Sep 18 2018, 16:25:33) [MSC v.1914 LIB Release x64]
libgif 5.1.4 : libjpeg 9b : libpng 1.6.35 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Found AVX
Found SSE

Png image is
cycounter

@Dian8
Copy link
Author

Dian8 commented Sep 21, 2018

Any suggestions?

@zdenop
Copy link
Contributor

zdenop commented Sep 21, 2018

Remove alpha channel from png image.

@zdenop
Copy link
Contributor

zdenop commented Sep 22, 2018

@amitdo @stweil @jbreiden : one of solution (workaround?) would be to remove alpha channel in tesseract. e.g. during SetImage:

index 9d0a5a51..a156c1ee 100644
--- a/src/ccmain/thresholder.cpp
+++ b/src/ccmain/thresholder.cpp
@@ -150,6 +150,14 @@ void ImageThresholder::SetImage(const Pix* pix) {
   if (pix_ != nullptr)
     pixDestroy(&pix_);
   Pix* src = const_cast<Pix*>(pix);
+  int spp = pixGetSpp(src);
+  // Remove alpha channel
+  if (spp == 4) {
+    PIX* tmp = pixRemoveAlpha(src);
+    pixSetSpp(tmp, 3);
+    src = pixCopy(nullptr, tmp);
+    pixDestroy(&tmp);
+  }
   int depth;

Not sure if we have time to check this for side effect.

@zdenop
Copy link
Contributor

zdenop commented Sep 26, 2018

remark: pdfrenderer.cpp also removes alpha channel from PNG images with pixAlphaBlendUniform function...
So it seem to make sense to get rid of alpha channel at the beginning....

@zdenop
Copy link
Contributor

zdenop commented Sep 27, 2018

Alpha channel seem to be only part of problem. I exported image with transparency to png and gif. Tesseract produce:

>tesseract.exe i1914r.png -
Empty page!!
Empty page!!

>tesseract.exe i1914r.gif -
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 244
m 130000

=> for png it is not able to do recognition. But when I use psm:

f:\Project-Personal\tesseract.test>tesseract.exe i1914r.png - --psm 6
S
m 130000

f:\Project-Personal\tesseract.test>tesseract.exe i1914r.gif - --psm 6
Warning. Invalid resolution 0 dpi. Using 70 instead.
m 130000

it produce output but worse than for gif... So more investigation should be done. Here are images for testing.

i1914r.gif i1914r i1914r.png i1914r

@amitdo
Copy link
Collaborator

amitdo commented Oct 7, 2018

Zdenko, did you try to do what the pdf renderer originally did? Your code uses a different function.

@zdenop
Copy link
Contributor

zdenop commented Oct 9, 2018

Explained in code review comment...

@stweil
Copy link
Member

stweil commented Nov 18, 2018

Removing the alpha channel does not change the OCR result for this test image. There is an alpha channel, but there are no transparent parts in the image, so that's not surprising.

The real One problem of the original test image is the resolution: it is missing, and Tesseract guesses a resolution of 147 dpi for the PNG image. With a resolution of at least 150 dpi Tesseract recognizes text "9) 130000". The same is true for other image formats. JPEG only works because Tesseract guesses 244 dpi for that format.

So I see two issues here:

  • Why does Tesseract guess different values for the resolution for the same image when only the image format is changed?

  • Why does the OCR result for that simple image depend on the resolution at all?

@MalathiBethu

This comment has been minimized.

@amitdo

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants