Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyMuPDF logging prints to stdout #700

Closed
deeplow opened this issue Feb 5, 2024 · 9 comments · Fixed by #742
Closed

PyMuPDF logging prints to stdout #700

deeplow opened this issue Feb 5, 2024 · 9 comments · Fixed by #742
Milestone

Comments

@deeplow
Copy link
Contributor

deeplow commented Feb 5, 2024

While converting a document on the page streaming PR I have just noticed the following:

json

Here's the line in question. It turns out that PyMuPDF's logging simply prints the error. Fortunately I haven't seen this in the pixels_to_PDF, but if it does happen it'll interfere with out page streaming since it assumes all stdout is pixel data.

@deeplow
Copy link
Contributor Author

deeplow commented Feb 5, 2024

This is probably a release blocker since it will interfere with the GUI as well:

visually

@deeplow deeplow added this to the 0.6.0 milestone Feb 5, 2024
@apyrgio
Copy link
Contributor

apyrgio commented Feb 6, 2024

There's something that does not make sense to me: how come you're hitting this issue, given that the offending function is in the fitz_new module, and not in the regular fitz module that we copy into the container?

COPY --from=pymupdf-build /usr/lib/python3.11/site-packages/fitz/ /usr/lib/python3.11/site-packages/fitz

Can you give a bit more info about the environment you are seeing this at?

@apyrgio
Copy link
Contributor

apyrgio commented Feb 6, 2024

Also, I've opened an issue in the PyMuPDF repo: pymupdf/PyMuPDF#3135

@deeplow
Copy link
Contributor Author

deeplow commented Feb 6, 2024

There's something that does not make sense to me: how come you're hitting this issue, given that the offending function is in the fitz_new module, and not in the regular fitz module that we copy into the container?

I was manually looking for the __init__.py so I probably find the wrong one. But now looking at the src_classic, __init__.py is much smaller. So I don't think it's that init they're talking about.

I am running this on Fedora 38 on Qubes. I'll try another environment to see if there are any differences.

@deeplow
Copy link
Contributor Author

deeplow commented Feb 6, 2024

And thanks for filing the upstream issue!

@apyrgio
Copy link
Contributor

apyrgio commented Feb 6, 2024

Yeap, turns out that in 1.23.9, the fitz_new module became fitz, and fitz became fitz_old: (changelog, rationale). Our use case is limited, and since our CI tests pass, it's probably best to follow suit.

@deeplow
Copy link
Contributor Author

deeplow commented Feb 7, 2024

Curious that this happened, especially in a point release. Here they have explained that:

The best way to protect against this happening is to try out the rebased implementation in stage 1 by using import fitz_new as fitz, and report any problems so they can be fixed.

But if there isn't even a major version change how are users supposed know to check that there is a migration in progress? When users notice things is when things break, which is what happened here...

Per discussion with @apyrgio we'll be trying to stick with fitz_old at least for this release due to this very issue.

@deeplow
Copy link
Contributor Author

deeplow commented Feb 8, 2024

The following may be a working solution:

diff --git a/Dockerfile b/Dockerfile
index 4b794b2d..b7e53091 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -61,7 +61,8 @@ RUN apk --no-cache -U upgrade && \
     py3-magic \
     font-noto-cjk
 
-COPY --from=pymupdf-build /usr/lib/python3.11/site-packages/fitz/ /usr/lib/python3.11/site-packages/fitz
+COPY --from=pymupdf-build /usr/lib/python3.11/site-packages/fitz /usr/lib/python3.11/site-packages/fitz
+COPY --from=pymupdf-build /usr/lib/python3.11/site-packages/fitz_old /usr/lib/python3.11/site-packages/fitz_old
 COPY --from=tessdata-dl /usr/share/tessdata/ /usr/share/tessdata
 COPY --from=h2orestart-dl /libreoffice_ext/ /libreoffice_ext
 
diff --git a/dangerzone/conversion/doc_to_pixels.py b/dangerzone/conversion/doc_to_pixels.py
index 5f1abf3b..2bce0cf2 100644
--- a/dangerzone/conversion/doc_to_pixels.py
+++ b/dangerzone/conversion/doc_to_pixels.py
@@ -7,7 +7,10 @@ import shutil
 import sys
 from typing import Dict, List, Optional, TextIO
 
-import fitz
+try:
+    import fitz_old as fitz
+except:
+    import fitz
 import magic
 
 from . import errors
diff --git a/dangerzone/conversion/pixels_to_pdf.py b/dangerzone/conversion/pixels_to_pdf.py
index 0243858d..e134680f 100644
--- a/dangerzone/conversion/pixels_to_pdf.py
+++ b/dangerzone/conversion/pixels_to_pdf.py
@@ -25,7 +25,10 @@ class PixelsToPDF(DangerzoneConverter):
             tempdir = "/safezone"
 
         # XXX lazy loading of fitz module to avoid import issues on non-Qubes systems
-        import fitz
+        try:
+            import fitz_old as fitz
+        except:
+            import fitz
 
         num_pages = len(glob.glob(f"{tempdir}/pixels/page-*.rgb"))
         total_size = 0.0

Note: do need to keep both fitz_old and fitz in the container image because if we rename the fitz_old module as fitz in the container image, this line fails since fitz_old is underfined. Even with a symlink fitz_old -> fitz doesn't help here.

@apyrgio what do you think? This solution works both in containers as well as in Qubes and it only adds 22MB to the container image (when compressed).

@apyrgio apyrgio changed the title PyMuPDF Logging Prints to stdout PyMuPDF logging prints to stdout Feb 8, 2024
@apyrgio
Copy link
Contributor

apyrgio commented Feb 12, 2024

There was another suggestion, of pinning PyMuPDF to 1.23.8 (now currently at 1.23.21), at least for this release. I looked into the PyMuPDF source, and I don't see any significant code changes in the src_classic/ directory of PyMuPDF since 1.23.8, so we should be good:

deeplow added a commit that referenced this issue Feb 12, 2024
PyMuPDF 1.23.9 made the swapped the new fitz implementation (fitz_new)
with the fitz module. In the new module there are prints in the code
that interfere with our stderror for sending JSON from the container.
Pinning the version seems to have no adverse consequences, since
fitz_old hasn't had significant changes and it gives breething room for
the print-related issue to be tackled in PR [2].

Fixes temporarily #700

[1]: #700 (comment)
[2]: pymupdf/PyMuPDF#3137
deeplow added a commit that referenced this issue Feb 12, 2024
PyMuPDF 1.23.9 made the swapped the new fitz implementation (fitz_new)
with the fitz module. In the new module there are prints in the code
that interfere with our stderror for sending JSON from the container.
Pinning the version seems to have no adverse consequences [1], since
fitz_old hasn't had significant changes and it gives breething room for
the print-related issue to be tackled in PR [2].

Fixes temporarily #700

[1]: #700 (comment)
[2]: pymupdf/PyMuPDF#3137
deeplow added a commit that referenced this issue Feb 12, 2024
PyMuPDF 1.23.9 swapped the new fitz implementation (fitz_new)
with the fitz module. In the new module there are prints in the code
that interfere with our stdout for sending JSON from the container.
Pinning the version seems to have no adverse consequences [1], since
fitz_old hasn't had significant changes and it gives breathing room for
the print-related issue to be tackled in PR [2].

Fixes temporarily #700

[1]: #700 (comment)
[2]: pymupdf/PyMuPDF#3137
@deeplow deeplow modified the milestones: 0.6.0, Bookmarks Feb 12, 2024
apyrgio added a commit that referenced this issue Mar 5, 2024
PyMuPDF has some hardcoded log messages that print to stdout [1]. We don't
have a way to silence them, because they don't use the Python logging
infrastructure.

What we can do here is silence a particular call that's been creating
debug messages. For a long term solution, we have sent a PR to the
PyMuPDF team, and we will follow up there [2].

Fixes #700

[1]: #700
[2]: pymupdf/PyMuPDF#3137
apyrgio added a commit that referenced this issue Mar 5, 2024
Unpin the PyMuPDF dependency, now that we have a way to silence its
debug logs that have been added in its new `fitz` implementation.

Refs #700
apyrgio added a commit that referenced this issue Mar 13, 2024
Unpin the PyMuPDF dependency, now that we have a way to silence its
debug logs that have been added in its new `fitz` implementation.

Refs #700
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants