-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add local caching of TilingPatterns in PartialEvaluator.getOperatorList
(issue 2765 and 8473)
#12458
Add local caching of TilingPatterns in PartialEvaluator.getOperatorList
(issue 2765 and 8473)
#12458
Conversation
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/41a69ebb23ef85b/output.txt |
…ist` (issue 2765 and 8473) In practice it's not uncommon for PDF documents to re-use the same TilingPatterns more than once, and parsing them is essentially equal to parsing of a (small) page since a `getOperatorList` call is required. By caching the internal TilingPattern representation we can thus avoid having to re-parse the same data over and over, and there's also *less* asynchronous parsing required for repeated TilingPatterns. Initially I had intended to include (standard) benchmark results with this patch, however it's not entirely clear that this is actually necessary here given the preliminary results. When testing this manually in the development viewer, using `pdfBug=Stats`, the following (approximate) reduction in rendering times were observed when comparing `master` against this patch: - http://pubs.usgs.gov/sim/3067/pdf/sim3067sheet-2.pdf (from issue 2765): `6800 ms` -> `4100 ms`. - https://github.com/mozilla/pdf.js/files/1046131/stepped.pdf (from issue 8473): `54000 ms` -> `13000 ms` - https://github.com/mozilla/pdf.js/files/1046130/proof.pdf (from issue 8473): `5900 ms` -> `2500 ms` As always, whenever you're dealing with documents which are "slow", there's usually a certain level of subjectivity involved with regards to what's deemed acceptable performance. Hence it's not clear to me that we want to regard any of the referenced issues as fixed, however the improvements are significant enough to warrant caching of TilingPatterns in my opinion.
From: Bot.io (Linux m4)FailedFull output at http://54.67.70.0:8877/41a69ebb23ef85b/output.txt Total script time: 25.39 mins
Image differences available at: http://54.67.70.0:8877/41a69ebb23ef85b/reftest-analyzer.html#web=eq.log |
4c269e1
to
30e8d5d
Compare
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/9472e3eb82dd9bf/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.67.70.0:8877/9472e3eb82dd9bf/output.txt Total script time: 25.47 mins
Image differences available at: http://54.67.70.0:8877/9472e3eb82dd9bf/reftest-analyzer.html#web=eq.log |
/botio-linux preview |
From: Bot.io (Linux m4)ReceivedCommand cmd_preview from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/e156c8136cfbe96/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/e156c8136cfbe96/output.txt Total script time: 3.57 mins Published |
/botio-windows test |
Nice work! I'd also say they are not completely fixed yet, but it's much better now. /botio makeref |
From: Bot.io (Linux m4)ReceivedCommand cmd_makeref from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/993ed63c1ba27e9/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/993ed63c1ba27e9/output.txt Total script time: 23.47 mins
|
/botio-windows makeref |
From: Bot.io (Windows)ReceivedCommand cmd_makeref from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.215.176.217:8877/e5b6a91e38c4152/output.txt |
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/e5b6a91e38c4152/output.txt Total script time: 26.08 mins
|
In practice it's not uncommon for PDF documents to re-use the same TilingPatterns more than once, and parsing them is essentially equal to parsing of a (small) page since a
getOperatorList
call is required.By caching the internal TilingPattern representation we can thus avoid having to re-parse the same data over and over, and there's also less asynchronous parsing required for repeated TilingPatterns.
Initially I had intended to include (standard) benchmark results with this patch, however it's not entirely clear that this is actually necessary here given the preliminary results.
When testing this manually in the development viewer, using
pdfBug=Stats
, the following (approximate) reduction in rendering times were observed when comparingmaster
against this patch:6800 ms
->4100 ms
.54000 ms
->13000 ms
5900 ms
->2500 ms
As always, whenever you're dealing with documents which are "slow", there's usually a certain level of subjectivity involved with regards to what's deemed acceptable performance.
Hence it's not clear to me that we want to regard any of the referenced issues as fixed, however the improvements are significant enough to warrant caching of TilingPatterns in my opinion.