Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Handle XML error when reading XmpInformation #1030

Merged
merged 4 commits into from
Jun 30, 2022
Merged

Conversation

MartinThoma
Copy link
Member

Closes #585

@MartinThoma MartinThoma changed the title MAINT: Handle XML error MAINT: Handle XML error when reading XmpInformation Jun 26, 2022
@codecov
Copy link

codecov bot commented Jun 26, 2022

Codecov Report

Merging #1030 (e6c745e) into main (d1f2ed2) will decrease coverage by 0.03%.
The diff coverage is 71.42%.

@@            Coverage Diff             @@
##             main    #1030      +/-   ##
==========================================
- Coverage   89.93%   89.90%   -0.04%     
==========================================
  Files          24       24              
  Lines        4482     4488       +6     
  Branches      916      916              
==========================================
+ Hits         4031     4035       +4     
- Misses        308      310       +2     
  Partials      143      143              
Impacted Files Coverage Δ
PyPDF2/xmp.py 88.17% <71.42%> (-0.66%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d1f2ed2...e6c745e. Read the comment docs.

@phoewass
Copy link

It would be nice if reader.xmp_metadata can have a non-strict mode.

@MartinThoma
Copy link
Member Author

I guess "None" would then be a reasonable return value?

@phoewass
Copy link

Absolutely!

And thank you for maintaining PyPDF2, it looks in a way better shape.

PyPDF2/xmp.py Outdated
data = self.stream.get_data()
doc_root: Document = parseString(data)
except ExpatError:
raise PdfReadError("XML in XmpInformation was invalid")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider including diagnostic information contained in the ExpactError object (line, offset, kind of XML issue)/

Example:

except ExpatError as e:
    print("Got some XML error. " + str(e))

@guillaume-uH57J9
Copy link

Thanks @MartinThoma for the fix.

I added a minor comment to include more diagnostic info in the exception.
Also as @phoewass said, a non-strict mode would be nice to have.
But this change is already better than the status quo, and this is a fairly minor issue, so so the change is already a good improvement as is.

Thanks for you work maintaining PyPDF2.

@MartinThoma MartinThoma added the soon PRs that are almost ready to be merged, issues that get solved pretty soon label Jun 27, 2022
@MartinThoma
Copy link
Member Author

Adding a non-strict mode (that passes the value down from the reader) touches more things than I initially thought. I think for the moment I'll leave this out.

@MartinThoma MartinThoma merged commit 97f36bd into main Jun 30, 2022
@MartinThoma MartinThoma deleted the issue-585 branch June 30, 2022 06:42
MartinThoma added a commit that referenced this pull request Jun 30, 2022
New Features (ENH):
-  Add writer.pdf_header property (getter and setter) (#1038)

Performance Improvements (PI):
-  Remove b_ call in FloatObject.write_to_stream (#1044)
-  Check duplicate objects in writer._sweep_indirect_references (#207)

Documentation (DOC):
-  How to surppress exceptions/warnings/log messages (#1037)
-  Remove hyphen from lossless (#1041)
-  Compression of content streams (#1040)
-  Fix inconsistent variable names in add-watermark.md (#1039)
-  File size reduction
-  Add CHANGELOG to the rendered docs (#1023)

Maintenance (MAINT):
-  Handle XML error when reading XmpInformation (#1030)
-  Deduplicate Code / add mutmut config (#1022)

Code Style (STY):
-  Use unnecessary one-line function / class attribute (#1043)
-  Docstring formatting (#1033)

Full Changelog: 2.4.0...2.4.1
MartinThoma added a commit that referenced this pull request Jul 3, 2022
MartinThoma added a commit that referenced this pull request Jul 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
soon PRs that are almost ready to be merged, issues that get solved pretty soon
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unexpected xml.parsers.expat.ExpatError on malformed PDF
3 participants