Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSTM: Windows: Warnings on console when using -oem 4 on command line #522

Closed
Shreeshrii opened this issue Dec 2, 2016 · 28 comments
Closed

Comments

@Shreeshrii
Copy link
Collaborator

While using png files as input and using oem 4 - LSTM, tesseract gives warnings on console in command mode. oem 0, 1, 2 and 3 gives no warnings.

On Windows 10, using 4.0 Alpha binaries provided by @stweil and 4.0 alpha traineddata

tesseract hin001.png hin001-hin-3 -oem 3 -l hin
Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica

tesseract hin001.png hin001-hin-4 -oem 4 -l hin
Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
...

The warnings seem to be given per line, so a page with 20 lines of text gets about 40 lines of messages.

I have not tested this in the linux environment. The warnings are probably from leptonica.

@zdenop
Copy link
Contributor

zdenop commented Dec 2, 2016

What is an issue?

@Shreeshrii
Copy link
Collaborator Author

There are a large number of warning messages when using -oem 4, not with other oem modes.

Is there a way to eliminate these?

C:\shree>tesseract san001.png san001 -oem 4 -l san
Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica
Detected 18 diacritics
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file
Warning in pixWriteMemPng: work-around: writing to a temp file
Warning in fopenReadFromMemory: work-around: writing to a temp file

san001

@Shreeshrii Shreeshrii changed the title LSTM: Warnings on console for png with -oem 4 LSTM: Warnings on console with -oem 4 Dec 2, 2016
@amitdo
Copy link
Collaborator

amitdo commented Dec 2, 2016

The warnings are indeed from Leptonica. They will not appear in Linux. You should ignore these ugly warnings.

Is there a way to eliminate these?

#292 (comment)

@Shreeshrii Shreeshrii changed the title LSTM: Warnings on console with -oem 4 LSTM: Warnings on console when using -oem 4 on command line Dec 2, 2016
@Shreeshrii
Copy link
Collaborator Author

Warnings are not limited to png, same for tif, gif and jpg.

@Shreeshrii Shreeshrii changed the title LSTM: Warnings on console when using -oem 4 on command line LSTM: Warnings on console when using -oem 4 on command line on windows Dec 2, 2016
@zdenop
Copy link
Contributor

zdenop commented Dec 2, 2016

  1. use tesseract user forum for asking question.
  2. this is more leptonica build problem (allowed warnings)
  3. Did you tried https://github.com/tesseract-ocr/tesseract/wiki/FAQ#how-can-i-suppress-tesseract-info-line?

@zdenop
Copy link
Contributor

zdenop commented Dec 2, 2016

They will not appear in Linux.
Because that problem is related to windows...

@Shreeshrii
Copy link
Collaborator Author

@zdenop @amitdo Thanks!

@stweil Is the version of leptonica included with the windows binaries a debug version?

@amitdo
Copy link
Collaborator

amitdo commented Dec 2, 2016

Warnings are not limited to png, same for tif, gif and jpg.

Still same answer...

It not a bug, although from an end user point of view it sure looks confusing.

@stweil
Copy link
Member

stweil commented Dec 2, 2016

The warnings are an indicator for potential optimizations and require more examination by developers, so I see there an issue to be discussed here.

AFAIK Leptonica warns when an image file is going to be mapped to memory, something that is unsupported for Leptonica's Windows code which uses a temporary file copy as an alternative.

If OCR of a single image results in many (number of lines) of those warnings, that might be caused by the same image being opened very often, or Tesseract acts on in-memory line images, but Leptonica for Windows has to write those images to disk. Then it is clear that at least for Windows the performance suffers and something should be done - either in Leptonica code or in Tesseract.

@stweil
Copy link
Member

stweil commented Dec 2, 2016

Is the version of leptonica included with the windows binaries a debug version?

No, it is production code. Such warnings also occurr(ed) with Tesseract 3.05 and earlier, for example when processing JPEG 2000 images (but only once per image). Some improvements in newer Leptonica code reduced the cases where this kind of warnings are shown.

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented Dec 2, 2016 via email

@amitdo
Copy link
Collaborator

amitdo commented Dec 2, 2016

@stweil
#292 (comment)
How did you compile Leptonica?

@amitdo
Copy link
Collaborator

amitdo commented Dec 2, 2016

http://www.leptonica.org/source/README.html

  1. Compile-time control over stderr output (see environ.h)

Leptonica provides both compile-time and run-time control over
messages and debug output (thanks to Dave Bryan). Both compile-time
and run-time severity thresholds can be set. The run-time threshold
can also be set by an environmental variable. Messages are
vararg-formatted and of 3 types: error, warning, informational.
These are all macros, and can be further suppressed when
NO_CONSOLE_IO is defined on the compile line. For production code
where no output is to go to stderr, compile with -DNO_CONSOLE_IO.

@amitdo
Copy link
Collaborator

amitdo commented Dec 2, 2016

@amitdo
Copy link
Collaborator

amitdo commented Dec 2, 2016

It's not a bug, but I agree that there is a place for improvements.

@stweil
Copy link
Member

stweil commented Dec 2, 2016

How did you compile Leptonica?

Basically I used ./configure: make; make install, but in a cross build on Debian GNU Linux.

@DanBloomberg
Copy link

DanBloomberg commented Dec 2, 2016

In production, there should be no need to see any leptonica output (info, warnings, errors) to stderr.
Methods for suppressing them are described above by Amit.
These messages are arguably INFO rather than WARNING.

Reads and writes from/to file for in-memory I/O on Windows are no longer required for gif and tiff formats. They were never required for webp. They are still required for jpeg, jp2k and png.

@amitdo
Copy link
Collaborator

amitdo commented Dec 2, 2016

Dan,

Thanks for the info.

Reads and writes from/to file for in-memory I/O on Windows are no longer required for gif and tiff formats.

Since which version?

The solution to the optimization problem might be in this specific case to always save the textlines images to tiff in memory.

@stweil
Copy link
Member

stweil commented Dec 2, 2016

Technically, the warnings occur on any platform which does not have a fmemopen function. Leptonica outputs them using a macro L_WARNING. It is possible to call setMsgSeverity from Tesseract code to disable any warning message from Leptonica (I don't think that would be a good idea).

@DanBloomberg
Copy link

Amit,

1.73 required r/w for in-memory I/O for gif and tiff on Windows. The current github master (and soon to be release 1.74.0) does not. The tiff patch was contributed by Stefan.

Yes, any 1 bpp images (e.g., in tesseract/viewer and ccstruct) can be output in IFF_TIFF_G4, which typically has better compression than png. Grayscale and color can be output in TIFF_ZIP or TIFF_LZW, which are typically inferior to png in compression.

@DanBloomberg
Copy link

Stefan, we can downgrade those messages in leptonica to INFO, and you can use setMsgSeverity() to disable INFO statements.

@stweil
Copy link
Member

stweil commented Dec 2, 2016

That would be possible, yes. But without those nagging messages, I'd never have had a look on that part of Leptonica.

@DanBloomberg
Copy link

Nevertheless, this seems like a reasonable thing to do to solve this annoyance and still present other WARNING messages.

@DanBloomberg
Copy link

Annoying messages have now been downgraded to INFO at github head.

You can use this in tesseract code to suppress all INFO and less urgent messages at run time:
setMsgSeverity(L_SEVERITY_WARNING);

Alternatively, for a bit more flexibility, you can define the environmental variable
#define LEPT_MSG_SEVERITY L_SEVERITY_WARNING
and use
setMsgSeverity(L_SEVERITY_EXTERNAL);

You can also change the severity level to WARNING and higher at compile time with this compiler flag:
-DDEFAULT_SEVERITY=4
This can be over-ridden at run time with either of the first two methods.

@amitdo
Copy link
Collaborator

amitdo commented Dec 3, 2016

Reads and writes from/to file for in-memory I/O on Windows are no longer required for gif and tiff formats.

I wonder how giflib and libiff do the equivalence of fmemopen on Windows.

@zdenop
Copy link
Contributor

zdenop commented Dec 3, 2016

@DanBloomberg : Thanks for support. Maybe it would be good to enable all warning for debug builds and hide them for release builds.

@zdenop zdenop closed this as completed Dec 3, 2016
@DanBloomberg
Copy link

@zdenop That's an interesting idea. Default setting is for INFO, WARNING and ERROR, but for a release that will be used in production it makes sense to only show ERROR.

@DanBloomberg
Copy link

@amitdo

Look at the leptonica wrappers for tiff (tiffio.c) and gif (gifio.c) to see how it is done.

Even better, the webp library implements compression and decompression directly with memory buffers, not with file streams. This is very nice, because it's platform independent, and you can easily read and write to files using it (see webpio.c). I wish the other image compression libraries had been implemented that way, but ...

@Shreeshrii Shreeshrii changed the title LSTM: Warnings on console when using -oem 4 on command line on windows LSTM: Windows: Warnings on console when using -oem 4 on command line Dec 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants