Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyPDF2 should not overwrite warnings.formatwarning. #67

Closed
podloucky-init opened this issue Feb 6, 2014 · 22 comments
Closed

PyPDF2 should not overwrite warnings.formatwarning. #67

podloucky-init opened this issue Feb 6, 2014 · 22 comments
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF

Comments

@podloucky-init
Copy link

Hello,

PyPDF2 1.2.0 overwrites warnings.formatwarning with its own implementation (utils._formatwarning) in pdf.py line 74:

warnings.formatwarning = utils._formatwarning

Unfortunately this may cause severe side-effects if PyPDF2 is imported in a larger application. In our case the PyPDF2 implementation of formatwarning caused IndexErrors whenever a warning was raised somewhere else (and the filename argument was not to the formatter's liking).

Personally, I do not think that it is a good idea for a library to interfere with the global logging/warning infrastructure.

P.S.: Apart from this problem, we have been using PyPDF2 successfully for some time now. Nice piece of software!

@mstamy2
Copy link
Collaborator

mstamy2 commented Feb 6, 2014

Hello,

I'm assuming this issue is the same as #39. This does seem like a major flaw, so we'll make rewriting the warning system a priority. I'm not too familiar with how warning systems are implemented; should PyPDF2 just use the default system?

@podloucky-init
Copy link
Author

Hi,
Thanks for the prompt reply!
Yes, this is the same problem. I am not too familiar with the warnings module but I think PyPDF2 should use the default warning system and formatter if possible. Some people might want to provide their own warning handling in their app so you probably cannot rely on having your implementation available.
The warnings.catch_warnings context manager should make it possible to override the global warnings.showwarning function in the context of the PyPDF2 module.

@CrimsonZen
Copy link

I ran into this issue the other day. In case you were entertaining the idea of a stopgap hotfix, you might be able to avoid logging errors by changing this line:
PyPDF2/utils.py:38

file = filename.replace("/","\\").rsplit("\\",1)[1] # find the file name

to

file = filename.replace("/","\\").rsplit("\\",1).pop() # find the file name

Obviously there are less hasty ways of dealing with the problem that are preferable, but this is the hotfix we have used on our deployments. "Nasty side-effects" is right - completely unrelated code that normally sends a warning can end up breaking warnings itself, obscuring the original warning and changing a silent failure to a loud one. On our Django application, we started getting 500's from Apache instead of a nice log and email from Django.

@mstamy2
Copy link
Collaborator

mstamy2 commented Feb 10, 2014

When pyPdf was created, it wasn't used within large applications - it had fewer features and less PDF compatibility. So, the custom warning overwrite was probably not a big deal. Now that PyPDF2's being used as a part of large applications, I can agree that the custom warning implementation is a problem. I'll work on reverting to the default system, though I need to become familiar with Python's warning/logging system.

@podloucky-init
Copy link
Author

@CrimsonZen I can feel the pain. We had exactly the same problem, but somewhat nastier: Django tried to trigger DeprecationWarning while processing a response. This gave a nice HTTP status code of 200 with a 0-byte response.
We circumvented the problem by disabling PyPDF2's overwriting of the warnings.* functions.

@mstamy2 We use PyPDF2 in a very limited scope but it was (and probably still is) the most efficient tool for the job. I do not even consider this bug/misbehavior to be a big deal as it is easy to spot and can be circumvented.

@mstamy2
Copy link
Collaborator

mstamy2 commented Feb 11, 2014

@podloucky-init Thanks for the feedback. On my end, I deleted the overwrites for formatwarning and showwarning. I am not sure how useful the custom implementations even were. So, would it be best to remove the overwrites for all users?

@mstamy2
Copy link
Collaborator

mstamy2 commented Feb 18, 2014

The custom warning implementations have been removed. The potential issues (3 separate cases reported here) aren't worth the extra information they provide.

It would be possible to re-implement the features in a way that doesn't overwrite any methods of warnings (a subclass of warnings for PyPDF2 to use?)

@podloucky-init
Copy link
Author

Sorry for the late reply: You could provide a formatter that formats warnings from PyPDF2. Users can then choose whether they want to use your formatter.
It might make sense to use a subclass of warnings.Warning for PyPDF2 warnings and let your formatter only handle those.

EDIT: sorry, clicked on close by mistake.

@mstamy2
Copy link
Collaborator

mstamy2 commented Feb 26, 2014

I re-implemented warnings with flag overwriteWarnings. It defaults to True, and utilizes the custom warning formatter as before. When False, it avoids overwriting warnings.showwarning.

The warnings print out code segments when overwriteWarnings is False; we should probably fix that in the future (it looks a bit confusing).

Anyway, I hope this resolves the issue successfully. I tried to create a subclass, but I'm not familiar with making subclasses out of modules (is that even possible?). Using a flag should be just as effective, as long as users are aware of it.

@CrimsonZen
Copy link

Strong Opinions Time: No library should ever assign to another external library. This is a huge violation of abstraction — let the warnings module handle warnings, and let PyPDF2 handle PDF's.

The warnings print out code segments when overwriteWarnings is False; we should probably fix that in the future (it looks a bit confusing).

It may be confusing to some, but it's the normal behavior. Just leave it out.

@mstamy2
Copy link
Collaborator

mstamy2 commented Feb 27, 2014

@CrimsonZen I think that the author of pyPdf felt justified in overwriting the warnings module because warnings.py seemingly advocates such an action (I don't have the module available right now, but it does mention something similar).

You're right, though, and I agree about it being a strong violation of abstraction/OOP in general. Perhaps warnings.py advocates some type of extension, not a direct overwrite, and pyPdf's author misinterpreted.

I'll go ahead and permanently remove the overwrites then. I was hesitant to do so at first because PyPDF2 aims to be a direct successor to pyPdf (adding/correcting features rather than removing), but it's probably for the best.

I'm also assuming that whatever features provided by the custom implementations of showwarning and formatwarning weren't overly useful. I think the intent was to

  1. Provide alternative logging through warndest parameter
  2. Add a little extra info to warnings.
    Maybe we can can maintain these features through other means.

Anyway, thanks for your input - I'll re-remove the warnings overwrite tomorrow (I'm currently on a mobile device).

@mstamy2 mstamy2 removed the I/O label Jun 5, 2014
@pavelbrych
Copy link

Hello,
we just run into the same problems as described here with overwritten showwarning method in larger project (interferes with SqlAlchemy). Constructor parameter for PdfFileReader() to turn custom implementation off does the trick, but unfortunately PdfFileReader() is instantiated also inside PdfFileMerger.merge() method and there is no option to turn custom showwarning method here. Would it be possible to add such option, if removing this code entirely is - for some reason - not preferred?

Thank you for great software, apart from this works nice for us.

@andrecp
Copy link
Contributor

andrecp commented Jan 13, 2016

Hi @pavelbrych I've fixed it here (I hope): opendesk@47b3e6d

I've done a PR to the library.

Cheers

@Kiza
Copy link

Kiza commented Jul 12, 2019

Fyi, it interferes with pandas as well.

@CartierPierre
Copy link

CartierPierre commented Jul 18, 2019

Please fix that ...

file = filename.replace("/", "\\").rsplit("\\", 1)[1] # find the file name
IndexError: list index out of range

Same as @Kiza it crashes when Pandas wants to show warning

@mstamy2 I saw your overwriteWarnings but it's not accessible from a highter lib which use PyPDF2.
Why not change locally the warning.showwarning ?

old_warn = warnings.showwarning
warnings.showwarning = _showwarning
warnings.warn(...)
warnings.showwarning = old_warn

@gaspa93
Copy link

gaspa93 commented Oct 15, 2019

Same as @Kiza and @CartierPierre: it breaks with Pandas warnings

@Biganon
Copy link

Biganon commented Mar 27, 2020

Still not fixed, I was becoming crazy trying to understand how numpy could cause PyPDF2 errors...

@sjvdm
Copy link

sjvdm commented Apr 7, 2020

I use PdfFileMerger in a multithreaded application and this has major effects(turns warnings into exceptions). I strongly suggest implementing overwriteWarnings for PdfFileMerger as per https://github.com/mstamy2/PyPDF2/pull/243/files

NexediGitlab pushed a commit to SlapOS/slapos that referenced this issue May 7, 2020
PyPDF2 must not overwrite showwarning because it breaks ERP5. ERP5 already overwrites it properly

py-pdf/pypdf#67 (comment)
NexediGitlab pushed a commit to SlapOS/slapos that referenced this issue May 12, 2020
PyPDF2 overwrites `warning.showwarning` and this is already confirmed as a major flaw: py-pdf/pypdf#67 (comment)

But, this is never applied in PyPDF2.

To fix this issue in ERP5, we will copy the patch to our stack and apply it.

See merge request !746
polyglot-jones pushed a commit to polyglot-jones/PyPDF2 that referenced this issue Aug 11, 2020
mlissner added a commit to freelawproject/courtlistener that referenced this issue Mar 23, 2022
In the following issues and PRs, PyPDF2 users have identified and
offered to fix a weird bug where the library overrides normal logging
in a way that breaks when you log certain things:

py-pdf/pypdf#67
py-pdf/pypdf#641
py-pdf/pypdf#452
py-pdf/pypdf#243

Not great, but there was probably a reason. Unfortunately, the
maintainers aren't merging any of the fixes people have provided
over the years, and when I upgraded to Python 3.10 one of our tests
changed in a way that it triggered this bug.

So, since the maintainers don't seem inclined to fix their own
package, this commit yanks it from CourtListener. It's good timing,
really, since we now have the microservice available, but it was
disappointing to see bugs and PRs related to this going back to 2014.
Most of the fixes are one or two-line PRs too. Bummer.
@MartinThoma MartinThoma added the needs-discussion The PR/issue needs more discussion before we can continue label Apr 7, 2022
@MartinThoma
Copy link
Member

I think this issue got solved with 89a29ef (February 2014, v1.25.1):

warnings.formatwarning = utils._formatwarning was formerly in pdf.py (e.g. 1311d787725f4601503876b0feb8c09f2f992536), but it's not longer there.

@MartinThoma
Copy link
Member

Uh, I've just seen warnings.showwarning = _showwarning

MartinThoma pushed a commit that referenced this issue Apr 16, 2022
@MartinThoma MartinThoma added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF and removed needs-discussion The PR/issue needs more discussion before we can continue labels Apr 16, 2022
@MartinThoma
Copy link
Member

Let's connect the issues:

@MartinThoma
Copy link
Member

This was fixed with PyPDF2 2.0.0. Thank you all for pointing out the issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF
Projects
None yet
Development

No branches or pull requests