-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LSTM: Windows: Warnings on console when using -oem 4 on command line #522
Comments
What is an issue? |
There are a large number of warning messages when using -oem 4, not with other oem modes. Is there a way to eliminate these? C:\shree>tesseract san001.png san001 -oem 4 -l san |
The warnings are indeed from Leptonica. They will not appear in Linux. You should ignore these ugly warnings.
|
Warnings are not limited to png, same for tif, gif and jpg. |
|
|
Still same answer... It not a bug, although from an end user point of view it sure looks confusing. |
The warnings are an indicator for potential optimizations and require more examination by developers, so I see there an issue to be discussed here. AFAIK Leptonica warns when an image file is going to be mapped to memory, something that is unsupported for Leptonica's Windows code which uses a temporary file copy as an alternative. If OCR of a single image results in many (number of lines) of those warnings, that might be caused by the same image being opened very often, or Tesseract acts on in-memory line images, but Leptonica for Windows has to write those images to disk. Then it is clear that at least for Windows the performance suffers and something should be done - either in Leptonica code or in Tesseract. |
No, it is production code. Such warnings also occurr(ed) with Tesseract 3.05 and earlier, for example when processing JPEG 2000 images (but only once per image). Some improvements in newer Leptonica code reduced the cases where this kind of warnings are shown. |
++ [email protected]
@DanBloomberg
cc:ing Dan Bloomberg for his input regarding Leptonica
…On Fri, Dec 2, 2016 at 7:12 PM, Stefan Weil ***@***.***> wrote:
The warnings are an indicator for potential optimizations and require more
examination by developers, so I see there an issue to be discussed here.
AFAIK Leptonica warns when an image file is going to be mapped to memory,
something that is unsupported for Leptonica's Windows code which uses a
temporary file copy as an alternative.
If OCR of a single image results in many (number of lines) of those
warnings, that might be caused by the same image being opened very often,
or Tesseract acts on in-memory line images, but Leptonica for Windows has
to write those images to disk. Then it is clear that at least for Windows
the performance suffers and something should be done - either in Leptonica
code or in Tesseract.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#522 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE2_o78_PkgYGsXTyk4uL6PqJdNjEorFks5rECBXgaJpZM4LCeas>
.
|
@stweil |
http://www.leptonica.org/source/README.html
|
It's not a bug, but I agree that there is a place for improvements. |
Basically I used |
In production, there should be no need to see any leptonica output (info, warnings, errors) to stderr. Reads and writes from/to file for in-memory I/O on Windows are no longer required for gif and tiff formats. They were never required for webp. They are still required for jpeg, jp2k and png. |
Dan, Thanks for the info.
Since which version? The solution to the optimization problem might be in this specific case to always save the textlines images to tiff in memory. |
Technically, the warnings occur on any platform which does not have a |
Amit, 1.73 required r/w for in-memory I/O for gif and tiff on Windows. The current github master (and soon to be release 1.74.0) does not. The tiff patch was contributed by Stefan. Yes, any 1 bpp images (e.g., in tesseract/viewer and ccstruct) can be output in IFF_TIFF_G4, which typically has better compression than png. Grayscale and color can be output in TIFF_ZIP or TIFF_LZW, which are typically inferior to png in compression. |
Stefan, we can downgrade those messages in leptonica to INFO, and you can use setMsgSeverity() to disable INFO statements. |
That would be possible, yes. But without those nagging messages, I'd never have had a look on that part of Leptonica. |
Nevertheless, this seems like a reasonable thing to do to solve this annoyance and still present other WARNING messages. |
Annoying messages have now been downgraded to INFO at github head. You can use this in tesseract code to suppress all INFO and less urgent messages at run time: Alternatively, for a bit more flexibility, you can define the environmental variable You can also change the severity level to WARNING and higher at compile time with this compiler flag: |
I wonder how giflib and libiff do the equivalence of fmemopen on Windows. |
@DanBloomberg : Thanks for support. Maybe it would be good to enable all warning for debug builds and hide them for release builds. |
@zdenop That's an interesting idea. Default setting is for INFO, WARNING and ERROR, but for a release that will be used in production it makes sense to only show ERROR. |
Look at the leptonica wrappers for tiff (tiffio.c) and gif (gifio.c) to see how it is done. Even better, the webp library implements compression and decompression directly with memory buffers, not with file streams. This is very nice, because it's platform independent, and you can easily read and write to files using it (see webpio.c). I wish the other image compression libraries had been implemented that way, but ... |
While using png files as input and using oem 4 - LSTM, tesseract gives warnings on console in command mode. oem 0, 1, 2 and 3 gives no warnings.
On Windows 10, using 4.0 Alpha binaries provided by @stweil and 4.0 alpha traineddata
The warnings seem to be given per line, so a page with 20 lines of text gets about 40 lines of messages.
I have not tested this in the linux environment. The warnings are probably from leptonica.
The text was updated successfully, but these errors were encountered: