Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to improve handling of missing trailer dictionaries in XRef.indexObjects (issue 18986) #19007

Merged
merged 1 commit into from
Nov 5, 2024

Conversation

Snuffleupagus
Copy link
Collaborator

The problem with the referenced PDF document has nothing to do with invalid dates, as the issue seems to suggest, but rather with the fact that it has neither an XRef table nor a trailer dictionary.
Given that crucial parts of the internal document structure is missing, you might argue that it's not really a PDF document.

In an attempt to support this kind of corruption, we'll simply iterate through all (previously found) XRef entries and pick one that might be a valid /Root dictionary.
There's obviously no guarantee that this works, and it might not be fast in larger PDF documents, but at least it cannot be any worse than immediately throwing InvalidPDFException as we previously did here.

Please note: I'm totally fine with this patch being rejected, since it's somewhat questionable if we should actually attempt to support "PDF documents" with this level of corruption.

…xObjects` (issue 18986)

The problem with the referenced PDF document has nothing to do with invalid dates, as the issue seems to suggest, but rather with the fact that it has neither an XRef table nor a trailer dictionary.
Given that crucial parts of the internal document structure is missing, you might argue that it's not really a PDF document.

In an attempt to support this kind of corruption, we'll simply iterate through all (previously found) XRef entries and pick one that *might* be a valid /Root dictionary.
There's obviously no guarantee that this works, and it might not be fast in larger PDF documents, but at least it cannot be any worse than *immediately* throwing `InvalidPDFException` as we previously did here.

*Please note:* I'm totally fine with this patch being rejected, since it's somewhat questionable if we should actually attempt to support "PDF documents" with this level of corruption.
@Snuffleupagus
Copy link
Collaborator Author

/botio test

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/de5bffa005c6846/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.193.163.58:8877/f6f2fff954288d9/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/de5bffa005c6846/output.txt

Total script time: 32.40 mins

  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  errors: 18
  different ref/snapshot: 18
  different first/second rendering: 1

Image differences available at: http://54.241.84.105:8877/de5bffa005c6846/reftest-analyzer.html#web=eq.log

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Success

Full output at http://54.193.163.58:8877/f6f2fff954288d9/output.txt

Total script time: 46.74 mins

  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: Passed

Copy link
Contributor

@calixteman calixteman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you.

@Snuffleupagus Snuffleupagus merged commit c78eebb into mozilla:master Nov 5, 2024
9 checks passed
@Snuffleupagus Snuffleupagus deleted the issue-18986 branch November 5, 2024 20:45
@fschutt
Copy link

fschutt commented Nov 6, 2024

@Snuffleupagus I was not aware that the PDF doesn't have an XRef table, I've forwarded the issue to lopdf (the PDF library that originally generated the PDF). Thanks for looking into this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: PDF file with invalid date doesn't load
4 participants