suppressing warnings and other messages that gets printed on stdout by PyMuPDF #209

cquark7 · 2018-10-07T10:52:12Z

PuMuPDF prints a lot of warnings and error messages on STDOUT while parsing PDF documents (especially while extracting images). I am looking for a way to suppress or redirect the messages that gets printed on STDOUT.

Example warnings/messages:

warning: openjpeg warning: No incltree created.

warning: openjpeg warning: No imsbtree created.

warning: openjpeg warning: tgt_create tree->numnodes == 0, no tree created.

These messages are quite annoying and serve no purpose (at least for my use case). I get more than 100 warnings just for a single PDF file.

I tried the methods present here: How do I prevent a C shared library to print on stdout in python? but they are not working with PyMuPDF, so please suggest something.

The text was updated successfully, but these errors were encountered:

JorjMcKie · 2018-10-07T14:28:48Z

These messages are issued directly by the underlying C library MuPDF, not by the wrapper code (i.e. me, PyMuPDF).
After some experiments I found that up to v1.13.0 there is no way to tell MuPDF it should use a different output stream (without modifying their source - which I will not do).

Currently, I am working on the new v1.14.0. I will check again whether they are now providing a proper circumvention.

So please check again once I publish PyMuPDF 1.14.0 in the next few weeks.

cquark7 · 2018-10-08T19:53:54Z

@JorjMcKie Thanks for your quick response. I was aware these messages are issued by MuPDF but I thought you may know a workaround for this problem. I really appreciate your efforts.

JorjMcKie · 2018-10-08T22:08:01Z

I am close to a clean compile of PyMuPDF v1.14.0 -- another day or so.
Before publishing it however, I want to ensure that there will be worthwhile new functionality, i.e. something clearly beyond v.13.20.

What I already know w/r to this issue:
Nothing has changed: errors and warnings are directly sent to stderr via fprintf.

JorjMcKie · 2018-10-29T21:07:20Z

Did some more research with the new v1.14.0.
There is no way to implement a (working) redirection of output going to stderr / stdout -- apart from modifying the MuPDF code directly.
Their documented way to achieve this simply does not work.
So I am regretfully closing this issue.

turicas · 2018-11-15T02:08:08Z

Is it possible to create an issue for MuPDF developers to suppress this kind of message? It's annoying to have a lot of unwanted lines on stdout.

JorjMcKie · 2018-11-15T07:21:13Z

Well, we could do that. But I'm afraid they have lots of other things to do, so they very probably will treat this as nice-to-have or, worse, as mannerism.

As I indicated above, there certainly is a brute-force alternative: replacing the MuPDF error module (error.c) by my own version. Instead of using the C-function fprintf it would use Python output to sys.stderr, so that the Python developper can then decide what to do with it.
This would however entail maintaining my own error.c in all future ... not nice. MuPDF must be generated with that modified error.c module.

In addition, this approach would not get rid of every single direct writing to system stderr: there are about 100 other places, where MuPDF directly outputs to stderr and does not make use of their own error.c. Not many of these will actually occur in normal processing of PyMuPDF, however.

However:
You are not the first requesting this type of thing. I personally also thoroughly dislike that behaviour. So let me think about it once more ...

JorjMcKie · 2018-11-15T10:52:01Z

I have been experimenting a bit:
@turicas - Can you try one of the wheels here? This repo is used as temporary store for my new wheels. You should find the right PyMuPDF v.1.14.0 if you are using Linux or Mac OSX (look in the repo's branches linux, resp. osx). I am generating Windows wheels locally. If you are using Windows, please tell me, and I will upload your version.

In this version I am redirecting MuPDF warnings and many errors to sys.stderr. If you reassign sys.stderr to e.g. some other file, you should be able to control many of these annoying messages.

Please do try and tell me what you think!

JorjMcKie · 2018-11-15T11:01:44Z

@cquark7 - forgot to mention you, sorry.
Please also try these preliminary wheels and tell me your assessment, thanks.

JorjMcKie · 2018-11-16T09:52:37Z

@cquark7 / @turicas

You have talked me into making some changes to MuPDF error / warning message handling ... both of you.
Please forget about the wheels I mentioned yesterday: I have a better solution now. Look at the following IDLE session. I am working with 2 documents in a way that causes MuPDF spill out several warnings.

I am intercepting MuPDF's stderr and storing these message at some place, so they won't appear anwhere: not on the system STDERR, nor on Python's sys.stderr. There is no need to re-assign sys.stderr. But the internal message store can be accessed and emptied by the programmer:

Python 2.7.15 (v2.7.15:ca079a3ea3, Apr 30 2018, 16:30:26) [MSC v.1500 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import fitz
>>> doc = fitz.open("acronis.xps") # some XPS document
>>> fitz.TOOLS.fitz_stderr         # message store is still empty
u''
>>> pdfbytes = doc.convertToPDF()  # convert XPS to PDF
>>> fitz.TOOLS.fitz_stderr         # and look at the message store:
u'warning: freetype getting character advance: invalid glyph index\n'
>>> fitz.TOOLS.fitz_stderr_reset() # empty the message store
>>> fitz.TOOLS.fitz_stderr         # and prove it
u''
>>> doc.close()                    # try another document: SVG this time
>>> doc = fitz.open("acronis.svg")
>>> fitz.TOOLS.fitz_stderr         # still no complaints?
u''
>>> pdfbytes = doc.convertToPDF()  # convert that one too
>>> fitz.TOOLS.fitz_stderr         # and see what would have gone to system STDERR
u'warning: ... repeated 3 times ...\nwarning: push viewport: 0 0 594.75 841.5\nwarning: push viewbox: 0 0 594.75 841.5\nwarning: push viewport: 0 0 594.75 841.5\nwarning: ... repeated 2 times ...\nwarning: push viewport: 0 0 980 71\nwarning: push viewport: 0 0 594.75 841.5\nwarning: ... repeated 2512 times ...\nwarning: push viewport: 0 0 112 33\nwarning: push viewport: 0 0 594.75 841.5\nwarning: ... repeated 2 times ...\nwarning: push viewport: 0 0 181 120\nwarning: push viewport: 0 0 94 54\nwarning: ... repeated 2 times ...\nwarning: push viewport: 0 0 130 88\nwarning: ... repeated 2 times ...\nwarning: push viewport: 0 0 181 115\nwarning: push viewport: 0 0 594.75 841.5\n'
>>>

I think this is the best achievable solution. As I said in previous posts:

I hope I am catching all warning messages and most error messages in this way.
From now on, I need to maintain my own copy of MuPDF's error module error.c. It must replace the original before MuPDF is generated.
The wheels I am generating for all platforms do contain this modification.
If anyone wants to generate his own MuPDF, he must apply this same modification to his copy of MuPDF. Otherwise that pesky output will appear again on system STDERR. Nothing worse than that will happen.

I will be generating the wheels within the next our or so to https://github.com/JorjMcKie/PyMuPDF-wheels.
Please have a look at them and let my know your reaction.

JorjMcKie · 2018-11-16T20:55:59Z

Just uploaded the new v1.14.0 which implements the issue resolution.

turicas · 2018-11-19T03:43:15Z

@JorjMcKie Thank you very much for this update! It's working as expected. :) I've tested with 3 different PDFs (each one outputs different warnings). The test code is:

# testwarning.py
import sys
import fitz

def pdf_to_text(filename):
    doc = fitz.open(filename, filetype="pdf")
    text = []
    for page_number in range(doc.pageCount):
        page = doc.loadPage(page_number)
        page_text = '\n'.join(block[4] for block in page.getTextBlocks())
        text.append(page_text)
    return '\n'.join(text)

pdf_to_text(sys.argv[1])

Output with PyMuPDF==1.13.20:

$ python testwarning.py DOE-AC-2013-05-27.pdf
warning: undefined link destination
warning: ... repeated 15 times ...
warning: freetype could not find any cmaps
$ python testwarning.py DOE-AC-2016-07-13.pdf
warning: ignoring transfer function
$ python testwarning.py balneabilidade-2018-02-16.pdf
error: cannot find startxref
warning: trying to repair broken xref
warning: repairing PDF document

Output with PyMuPDF-1.14.0-cp37-cp37m-manylinux1_x86_64.whl (it's not available on PyPI) - no output expected:

$ python testwarning.py DOE-AC-2013-05-27.pdf
$ python testwarning.py DOE-AC-2016-07-13.pdf
$ python testwarning.py balneabilidade-2018-02-16.pdf

The PDFs are available for download, if you'd like to test:

Could you please upload this new version to PyPI? Thanks again!

JorjMcKie · 2018-11-19T10:18:47Z

pleased to hear that!
I will upload v1.14.1 later today - also to PyPI. This patch contains minor performance improvements (I am a maniac in this respect) and also support for pathlib filenames.

wave-DmP · 2020-03-12T15:29:08Z

Hi! I'm getting "mupdf: invalid page object" printed to the console when opening pdf's. Are these among the "errors" rather than warnings that have been kept as is? Is it possible to reroute them to fitz_stderr instead?

JorjMcKie · 2020-03-12T16:12:31Z

@wave-DmP:

Are these among the "errors" rather than warnings that have been kept as is?

No, this is an error, not a warning. In broken PDFs this may happen, when a dictionary object does not conform to a page dictionary. You might however still be able to work with the PDF, but consequential other errors may occur. It is usually still possible to extract e.g. images or fonts if looping over the xrefs (and not the pages).

Is it possible to reroute them to fitz_stderr instead?

Yes, there is an option to switch off or on MuPDF error message output via fitz.TOOLS.mupdf_display_errors(None/True/False). Where None displays the current state.

wave-DmP · 2020-03-12T16:24:03Z

import fitz
fitz.TOOLS.mupdf_display_errors(False)

gives me
AttributeError: 'Tools' object has no attribute 'mupdf_display_errors' at line 2

currently on PyMuPDF v 1.16.11

JorjMcKie · 2020-03-12T17:09:09Z

Weird:

ipython
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.8.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import fitz

In [2]: fitz.TOOLS.mupdf_display_errors()
Out[2]: True

In [3]: print(fitz.__doc__)

PyMuPDF 1.16.8: Python bindings for the MuPDF 1.16.0 library.
Version date: 2019-11-18 18:12:19.
Built for Python 3.7 on linux (64-bit).

or

>>> import fitz
>>> print(fitz.__doc__)

PyMuPDF 1.16.11: Python bindings for the MuPDF 1.16.0 library.
Version date: 2020-02-21 15:40:27.
Built for Python 3.8 on win32 (64-bit).

>>> fitz.TOOLS.mupdf_display_errors()
True
>>>

wave-DmP · 2020-03-12T17:12:17Z

this is what I get using pycharm, python Python 3.6.7

JorjMcKie · 2020-03-12T17:14:50Z

The method was new in v1.16.8

wave-DmP · 2020-03-13T13:25:03Z

solved, pardon my versioning ignorance :)

JorjMcKie · 2020-03-13T17:50:56Z

Nothing to forgive - everything fine.

cquark7 changed the title ~~I cannot suppress warnings and other messages that gets printed on stdout by PyMuPDF~~ suppressing warnings and other messages that gets printed on stdout by PyMuPDF Oct 7, 2018

JorjMcKie added enhancement wontfix no intention to resolve labels Oct 29, 2018

JorjMcKie closed this as completed Oct 29, 2018

JorjMcKie reopened this Nov 16, 2018

JorjMcKie removed the wontfix no intention to resolve label Nov 16, 2018

JorjMcKie closed this as completed Nov 16, 2018

JorjMcKie mentioned this issue Aug 12, 2019

use warning and error callback instead of fprintf patch #315

Closed

PackElend mentioned this issue Dec 25, 2020

Is there any plan to remove dependency of PyPDF2? camelot-dev/camelot#215

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

suppressing warnings and other messages that gets printed on stdout by PyMuPDF #209

suppressing warnings and other messages that gets printed on stdout by PyMuPDF #209

cquark7 commented Oct 7, 2018

JorjMcKie commented Oct 7, 2018

cquark7 commented Oct 8, 2018

JorjMcKie commented Oct 8, 2018

JorjMcKie commented Oct 29, 2018

turicas commented Nov 15, 2018

JorjMcKie commented Nov 15, 2018

JorjMcKie commented Nov 15, 2018

JorjMcKie commented Nov 15, 2018

JorjMcKie commented Nov 16, 2018

JorjMcKie commented Nov 16, 2018

turicas commented Nov 19, 2018 •

edited

Loading

JorjMcKie commented Nov 19, 2018

wave-DmP commented Mar 12, 2020 •

edited

Loading

JorjMcKie commented Mar 12, 2020

wave-DmP commented Mar 12, 2020

JorjMcKie commented Mar 12, 2020

wave-DmP commented Mar 12, 2020 •

edited

Loading

JorjMcKie commented Mar 12, 2020

wave-DmP commented Mar 13, 2020

JorjMcKie commented Mar 13, 2020

suppressing warnings and other messages that gets printed on stdout by PyMuPDF #209

suppressing warnings and other messages that gets printed on stdout by PyMuPDF #209

Comments

cquark7 commented Oct 7, 2018

JorjMcKie commented Oct 7, 2018

cquark7 commented Oct 8, 2018

JorjMcKie commented Oct 8, 2018

JorjMcKie commented Oct 29, 2018

turicas commented Nov 15, 2018

JorjMcKie commented Nov 15, 2018

JorjMcKie commented Nov 15, 2018

JorjMcKie commented Nov 15, 2018

JorjMcKie commented Nov 16, 2018

JorjMcKie commented Nov 16, 2018

turicas commented Nov 19, 2018 • edited Loading

JorjMcKie commented Nov 19, 2018

wave-DmP commented Mar 12, 2020 • edited Loading

JorjMcKie commented Mar 12, 2020

wave-DmP commented Mar 12, 2020

JorjMcKie commented Mar 12, 2020

wave-DmP commented Mar 12, 2020 • edited Loading

JorjMcKie commented Mar 12, 2020

wave-DmP commented Mar 13, 2020

JorjMcKie commented Mar 13, 2020

turicas commented Nov 19, 2018 •

edited

Loading

wave-DmP commented Mar 12, 2020 •

edited

Loading

wave-DmP commented Mar 12, 2020 •

edited

Loading