-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to improve handling of missing trailer dictionaries in XRef.indexObjects
(issue 18986)
#19007
Conversation
…xObjects` (issue 18986) The problem with the referenced PDF document has nothing to do with invalid dates, as the issue seems to suggest, but rather with the fact that it has neither an XRef table nor a trailer dictionary. Given that crucial parts of the internal document structure is missing, you might argue that it's not really a PDF document. In an attempt to support this kind of corruption, we'll simply iterate through all (previously found) XRef entries and pick one that *might* be a valid /Root dictionary. There's obviously no guarantee that this works, and it might not be fast in larger PDF documents, but at least it cannot be any worse than *immediately* throwing `InvalidPDFException` as we previously did here. *Please note:* I'm totally fine with this patch being rejected, since it's somewhat questionable if we should actually attempt to support "PDF documents" with this level of corruption.
/botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/de5bffa005c6846/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.193.163.58:8877/f6f2fff954288d9/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/de5bffa005c6846/output.txt Total script time: 32.40 mins
Image differences available at: http://54.241.84.105:8877/de5bffa005c6846/reftest-analyzer.html#web=eq.log |
From: Bot.io (Windows)SuccessFull output at http://54.193.163.58:8877/f6f2fff954288d9/output.txt Total script time: 46.74 mins
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you.
@Snuffleupagus I was not aware that the PDF doesn't have an XRef table, I've forwarded the issue to lopdf (the PDF library that originally generated the PDF). Thanks for looking into this issue. |
The problem with the referenced PDF document has nothing to do with invalid dates, as the issue seems to suggest, but rather with the fact that it has neither an XRef table nor a trailer dictionary.
Given that crucial parts of the internal document structure is missing, you might argue that it's not really a PDF document.
In an attempt to support this kind of corruption, we'll simply iterate through all (previously found) XRef entries and pick one that might be a valid /Root dictionary.
There's obviously no guarantee that this works, and it might not be fast in larger PDF documents, but at least it cannot be any worse than immediately throwing
InvalidPDFException
as we previously did here.Please note: I'm totally fine with this patch being rejected, since it's somewhat questionable if we should actually attempt to support "PDF documents" with this level of corruption.