-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected xml.parsers.expat.ExpatError on malformed PDF #585
Comments
I have encountered a similar error with a real-world PDF when calling getXmpMetadata(). PyPDF version 1.26.0-4 Error message:
|
@guillaume-uH57J9 Could you please share a full traceback? Do you have an example PDF? |
@MartinThoma You will find an example PDF and a callstack in the first comment from @Google-Autofuzz |
@guillaume-uH57J9 Thank you 🙏 I've completely missed that 😅 Sadly, the issue still occurs with the latest version of PyPDF2. It looks to me as if the included XML of the PDF document is broken. We might never be able to read the content, but we should raise a warning / a PyPDF2 expection that is more explicit. |
@MartinThoma Yes, it would be better to either have a warning and return None, or throw an exception at the PyPDF2 level. Expat is kind of an implementation details, so exposing expat exceptions is not ideal. At the moment, if you want to safely use PyPDF2, you have to import expat in order to catch that specific exception. As an aside, help(PyPDF2.PdfFileReader.xmpMetadata) does not mention any exception at the moment, so you wouldn't know to catch any exception until you stumble upon this. If an exception is raised, it would be better to document it. |
@guillaume-uH57J9 What do you think about #1030 ? |
I replied in #1030 |
When running the following code with the latest pypi version of PyPDF2 on the attached input results in an unexpected
xml.parsers.expat.ExpatError
:MCVE: Code + PDF
Example document: test.pdf
Traceback
Environment
The text was updated successfully, but these errors were encountered: