-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor getPage
(in the worker), and attempt to use the Linearization
dictionary to lookup the first Page
#9931
Conversation
/botio-windows test |
…ent` instead Addresses an existing TODO, and avoids having to pass in a `pageFactory` when creating `Catalog` instances.
…irst Page Since PDF.js already supports range requests and streaming, not to mention chunked rendering, attempting to use the `Linearization` dictionary in `PDFDocument.getPage` probably isn't going to improve performance in any noticeable way. Nonetheless, when `Linearization` data is available, it will allow looking up the first Page *directly* without having to descend into the `Pages` tree to find the correct object.
915d26f
to
ec3728b
Compare
/botio-linux preview |
From: Bot.io (Linux m4)ReceivedCommand cmd_preview from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/77c2813403c7e04/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/77c2813403c7e04/output.txt Total script time: 2.85 mins Published |
/botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/f773d67175a8472/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://54.215.176.217:8877/7ef07cc460ed5a7/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/f773d67175a8472/output.txt Total script time: 19.49 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/7ef07cc460ed5a7/output.txt Total script time: 26.97 mins
|
Good to have this; thanks! |
As expected, using the
Linearization
data doesn't appear to have had any (noticeable) performance impact. This can probably be attributed toCatalog.getPageDict
only fetching the required nodes, and not the entirePages
tree, in combination with caching of already resolved nodes (inpageKidsCountCache
).Edit: To clarify the above, for Linearized files this patch does result in (at most) a handful of fewer invocations of
this code
since the first Page can be accessed directly.So while the amount of data being loaded when fetching the first Page is reduced, the difference is really tiny in practice. Hence it seems that unless the server/connection is really slow, the difference would most likely not be seen/felt. (Also, keeping in mind that the default viewer will pre-render the next/previous page.)
Fixes #9716.