Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suppressing warnings and other messages that gets printed on stdout by PyMuPDF #209

Closed
cquark7 opened this issue Oct 7, 2018 · 20 comments
Closed

Comments

@cquark7
Copy link

cquark7 commented Oct 7, 2018

PuMuPDF prints a lot of warnings and error messages on STDOUT while parsing PDF documents (especially while extracting images). I am looking for a way to suppress or redirect the messages that gets printed on STDOUT.

Example warnings/messages:

warning: openjpeg warning: No incltree created.

warning: openjpeg warning: No imsbtree created.

warning: openjpeg warning: tgt_create tree->numnodes == 0, no tree created.

These messages are quite annoying and serve no purpose (at least for my use case). I get more than 100 warnings just for a single PDF file.

I tried the methods present here: How do I prevent a C shared library to print on stdout in python? but they are not working with PyMuPDF, so please suggest something.

@cquark7 cquark7 changed the title I cannot suppress warnings and other messages that gets printed on stdout by PyMuPDF suppressing warnings and other messages that gets printed on stdout by PyMuPDF Oct 7, 2018
@JorjMcKie
Copy link
Collaborator

These messages are issued directly by the underlying C library MuPDF, not by the wrapper code (i.e. me, PyMuPDF).
After some experiments I found that up to v1.13.0 there is no way to tell MuPDF it should use a different output stream (without modifying their source - which I will not do).

Currently, I am working on the new v1.14.0. I will check again whether they are now providing a proper circumvention.

So please check again once I publish PyMuPDF 1.14.0 in the next few weeks.

@cquark7
Copy link
Author

cquark7 commented Oct 8, 2018

@JorjMcKie Thanks for your quick response. I was aware these messages are issued by MuPDF but I thought you may know a workaround for this problem. I really appreciate your efforts.

@JorjMcKie
Copy link
Collaborator

I am close to a clean compile of PyMuPDF v1.14.0 -- another day or so.
Before publishing it however, I want to ensure that there will be worthwhile new functionality, i.e. something clearly beyond v.13.20.

What I already know w/r to this issue:
Nothing has changed: errors and warnings are directly sent to stderr via fprintf.

@JorjMcKie JorjMcKie added enhancement wontfix no intention to resolve labels Oct 29, 2018
@JorjMcKie
Copy link
Collaborator

Did some more research with the new v1.14.0.
There is no way to implement a (working) redirection of output going to stderr / stdout -- apart from modifying the MuPDF code directly.
Their documented way to achieve this simply does not work.
So I am regretfully closing this issue.

@turicas
Copy link

turicas commented Nov 15, 2018

Is it possible to create an issue for MuPDF developers to suppress this kind of message? It's annoying to have a lot of unwanted lines on stdout.

@JorjMcKie
Copy link
Collaborator

Well, we could do that. But I'm afraid they have lots of other things to do, so they very probably will treat this as nice-to-have or, worse, as mannerism.

As I indicated above, there certainly is a brute-force alternative: replacing the MuPDF error module (error.c) by my own version. Instead of using the C-function fprintf it would use Python output to sys.stderr, so that the Python developper can then decide what to do with it.
This would however entail maintaining my own error.c in all future ... not nice. MuPDF must be generated with that modified error.c module.

In addition, this approach would not get rid of every single direct writing to system stderr: there are about 100 other places, where MuPDF directly outputs to stderr and does not make use of their own error.c. Not many of these will actually occur in normal processing of PyMuPDF, however.

However:
You are not the first requesting this type of thing. I personally also thoroughly dislike that behaviour. So let me think about it once more ...

@JorjMcKie
Copy link
Collaborator

I have been experimenting a bit:
@turicas - Can you try one of the wheels here? This repo is used as temporary store for my new wheels. You should find the right PyMuPDF v.1.14.0 if you are using Linux or Mac OSX (look in the repo's branches linux, resp. osx). I am generating Windows wheels locally. If you are using Windows, please tell me, and I will upload your version.

In this version I am redirecting MuPDF warnings and many errors to sys.stderr. If you reassign sys.stderr to e.g. some other file, you should be able to control many of these annoying messages.

Please do try and tell me what you think!

@JorjMcKie
Copy link
Collaborator

@cquark7 - forgot to mention you, sorry.
Please also try these preliminary wheels and tell me your assessment, thanks.

@JorjMcKie JorjMcKie reopened this Nov 16, 2018
@JorjMcKie JorjMcKie removed the wontfix no intention to resolve label Nov 16, 2018
@JorjMcKie
Copy link
Collaborator

@cquark7 / @turicas

You have talked me into making some changes to MuPDF error / warning message handling ... both of you.
Please forget about the wheels I mentioned yesterday: I have a better solution now. Look at the following IDLE session. I am working with 2 documents in a way that causes MuPDF spill out several warnings.

I am intercepting MuPDF's stderr and storing these message at some place, so they won't appear anwhere: not on the system STDERR, nor on Python's sys.stderr. There is no need to re-assign sys.stderr. But the internal message store can be accessed and emptied by the programmer:

Python 2.7.15 (v2.7.15:ca079a3ea3, Apr 30 2018, 16:30:26) [MSC v.1500 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import fitz
>>> doc = fitz.open("acronis.xps") # some XPS document
>>> fitz.TOOLS.fitz_stderr         # message store is still empty
u''
>>> pdfbytes = doc.convertToPDF()  # convert XPS to PDF
>>> fitz.TOOLS.fitz_stderr         # and look at the message store:
u'warning: freetype getting character advance: invalid glyph index\n'
>>> fitz.TOOLS.fitz_stderr_reset() # empty the message store
>>> fitz.TOOLS.fitz_stderr         # and prove it
u''
>>> doc.close()                    # try another document: SVG this time
>>> doc = fitz.open("acronis.svg")
>>> fitz.TOOLS.fitz_stderr         # still no complaints?
u''
>>> pdfbytes = doc.convertToPDF()  # convert that one too
>>> fitz.TOOLS.fitz_stderr         # and see what would have gone to system STDERR
u'warning: ... repeated 3 times ...\nwarning: push viewport: 0 0 594.75 841.5\nwarning: push viewbox: 0 0 594.75 841.5\nwarning: push viewport: 0 0 594.75 841.5\nwarning: ... repeated 2 times ...\nwarning: push viewport: 0 0 980 71\nwarning: push viewport: 0 0 594.75 841.5\nwarning: ... repeated 2512 times ...\nwarning: push viewport: 0 0 112 33\nwarning: push viewport: 0 0 594.75 841.5\nwarning: ... repeated 2 times ...\nwarning: push viewport: 0 0 181 120\nwarning: push viewport: 0 0 94 54\nwarning: ... repeated 2 times ...\nwarning: push viewport: 0 0 130 88\nwarning: ... repeated 2 times ...\nwarning: push viewport: 0 0 181 115\nwarning: push viewport: 0 0 594.75 841.5\n'
>>> 

I think this is the best achievable solution. As I said in previous posts:

  • I hope I am catching all warning messages and most error messages in this way.
  • From now on, I need to maintain my own copy of MuPDF's error module error.c. It must replace the original before MuPDF is generated.
  • The wheels I am generating for all platforms do contain this modification.
  • If anyone wants to generate his own MuPDF, he must apply this same modification to his copy of MuPDF. Otherwise that pesky output will appear again on system STDERR. Nothing worse than that will happen.

I will be generating the wheels within the next our or so to https://github.com/JorjMcKie/PyMuPDF-wheels.
Please have a look at them and let my know your reaction.

@JorjMcKie
Copy link
Collaborator

Just uploaded the new v1.14.0 which implements the issue resolution.

@turicas
Copy link

turicas commented Nov 19, 2018

@JorjMcKie Thank you very much for this update! It's working as expected. :) I've tested with 3 different PDFs (each one outputs different warnings). The test code is:

# testwarning.py
import sys
import fitz

def pdf_to_text(filename):
    doc = fitz.open(filename, filetype="pdf")
    text = []
    for page_number in range(doc.pageCount):
        page = doc.loadPage(page_number)
        page_text = '\n'.join(block[4] for block in page.getTextBlocks())
        text.append(page_text)
    return '\n'.join(text)

pdf_to_text(sys.argv[1])

Output with PyMuPDF==1.13.20:

$ python testwarning.py DOE-AC-2013-05-27.pdf
warning: undefined link destination
warning: ... repeated 15 times ...
warning: freetype could not find any cmaps
$ python testwarning.py DOE-AC-2016-07-13.pdf
warning: ignoring transfer function
$ python testwarning.py balneabilidade-2018-02-16.pdf
error: cannot find startxref
warning: trying to repair broken xref
warning: repairing PDF document

Output with PyMuPDF-1.14.0-cp37-cp37m-manylinux1_x86_64.whl (it's not available on PyPI) - no output expected:

$ python testwarning.py DOE-AC-2013-05-27.pdf
$ python testwarning.py DOE-AC-2016-07-13.pdf
$ python testwarning.py balneabilidade-2018-02-16.pdf

The PDFs are available for download, if you'd like to test:

Could you please upload this new version to PyPI? Thanks again!

@JorjMcKie
Copy link
Collaborator

pleased to hear that!
I will upload v1.14.1 later today - also to PyPI. This patch contains minor performance improvements (I am a maniac in this respect) and also support for pathlib filenames.

@wave-DmP
Copy link

wave-DmP commented Mar 12, 2020

Hi! I'm getting "mupdf: invalid page object" printed to the console when opening pdf's. Are these among the "errors" rather than warnings that have been kept as is? Is it possible to reroute them to fitz_stderr instead?

@JorjMcKie
Copy link
Collaborator

@wave-DmP:

Are these among the "errors" rather than warnings that have been kept as is?

No, this is an error, not a warning. In broken PDFs this may happen, when a dictionary object does not conform to a page dictionary. You might however still be able to work with the PDF, but consequential other errors may occur. It is usually still possible to extract e.g. images or fonts if looping over the xrefs (and not the pages).

Is it possible to reroute them to fitz_stderr instead?

Yes, there is an option to switch off or on MuPDF error message output via fitz.TOOLS.mupdf_display_errors(None/True/False). Where None displays the current state.

@wave-DmP
Copy link

import fitz
fitz.TOOLS.mupdf_display_errors(False)

gives me
AttributeError: 'Tools' object has no attribute 'mupdf_display_errors' at line 2

currently on PyMuPDF v 1.16.11

@JorjMcKie
Copy link
Collaborator

Weird:

ipython
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.8.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import fitz

In [2]: fitz.TOOLS.mupdf_display_errors()
Out[2]: True

In [3]: print(fitz.__doc__)

PyMuPDF 1.16.8: Python bindings for the MuPDF 1.16.0 library.
Version date: 2019-11-18 18:12:19.
Built for Python 3.7 on linux (64-bit).

or

>>> import fitz
>>> print(fitz.__doc__)

PyMuPDF 1.16.11: Python bindings for the MuPDF 1.16.0 library.
Version date: 2020-02-21 15:40:27.
Built for Python 3.8 on win32 (64-bit).

>>> fitz.TOOLS.mupdf_display_errors()
True
>>> 

@wave-DmP
Copy link

wave-DmP commented Mar 12, 2020

this is what I get using pycharm, python Python 3.6.7

image

image

@JorjMcKie
Copy link
Collaborator

The method was new in v1.16.8

@wave-DmP
Copy link

solved, pardon my versioning ignorance :)

@JorjMcKie
Copy link
Collaborator

Nothing to forgive - everything fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants