Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add local caching of TilingPatterns in PartialEvaluator.getOperatorList (issue 2765 and 8473) #12458

Merged

Conversation

Snuffleupagus
Copy link
Collaborator

In practice it's not uncommon for PDF documents to re-use the same TilingPatterns more than once, and parsing them is essentially equal to parsing of a (small) page since a getOperatorList call is required.

By caching the internal TilingPattern representation we can thus avoid having to re-parse the same data over and over, and there's also less asynchronous parsing required for repeated TilingPatterns.

Initially I had intended to include (standard) benchmark results with this patch, however it's not entirely clear that this is actually necessary here given the preliminary results.
When testing this manually in the development viewer, using pdfBug=Stats, the following (approximate) reduction in rendering times were observed when comparing master against this patch:

As always, whenever you're dealing with documents which are "slow", there's usually a certain level of subjectivity involved with regards to what's deemed acceptable performance.
Hence it's not clear to me that we want to regard any of the referenced issues as fixed, however the improvements are significant enough to warrant caching of TilingPatterns in my opinion.

@pdfjsbot
Copy link

pdfjsbot commented Oct 8, 2020

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/41a69ebb23ef85b/output.txt

…ist` (issue 2765 and 8473)

In practice it's not uncommon for PDF documents to re-use the same TilingPatterns more than once, and parsing them is essentially equal to parsing of a (small) page since a `getOperatorList` call is required.

By caching the internal TilingPattern representation we can thus avoid having to re-parse the same data over and over, and there's also *less* asynchronous parsing required for repeated TilingPatterns.

Initially I had intended to include (standard) benchmark results with this patch, however it's not entirely clear that this is actually necessary here given the preliminary results.
When testing this manually in the development viewer, using `pdfBug=Stats`, the following (approximate) reduction in rendering times were observed when comparing `master` against this patch:
 - http://pubs.usgs.gov/sim/3067/pdf/sim3067sheet-2.pdf (from issue 2765): `6800 ms` -> `4100 ms`.
 - https://github.com/mozilla/pdf.js/files/1046131/stepped.pdf (from issue 8473): `54000 ms` -> `13000 ms`
 - https://github.com/mozilla/pdf.js/files/1046130/proof.pdf (from issue 8473): `5900 ms` -> `2500 ms`

As always, whenever you're dealing with documents which are "slow", there's usually a certain level of subjectivity involved with regards to what's deemed acceptable performance.
Hence it's not clear to me that we want to regard any of the referenced issues as fixed, however the improvements are significant enough to warrant caching of TilingPatterns in my opinion.
@pdfjsbot
Copy link

pdfjsbot commented Oct 8, 2020

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/41a69ebb23ef85b/output.txt

Total script time: 25.39 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/41a69ebb23ef85b/reftest-analyzer.html#web=eq.log

@Snuffleupagus Snuffleupagus force-pushed the LocalTilingPatternCache branch from 4c269e1 to 30e8d5d Compare October 8, 2020 16:55
@pdfjsbot
Copy link

pdfjsbot commented Oct 8, 2020

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/9472e3eb82dd9bf/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Oct 8, 2020

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/9472e3eb82dd9bf/output.txt

Total script time: 25.47 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/9472e3eb82dd9bf/reftest-analyzer.html#web=eq.log

@timvandermeij
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

pdfjsbot commented Oct 8, 2020

From: Bot.io (Linux m4)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/e156c8136cfbe96/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Oct 8, 2020

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/e156c8136cfbe96/output.txt

Total script time: 3.57 mins

Published

@timvandermeij
Copy link
Contributor

/botio-windows test

@timvandermeij timvandermeij merged commit e59a90d into mozilla:master Oct 8, 2020
@timvandermeij
Copy link
Contributor

Nice work! I'd also say they are not completely fixed yet, but it's much better now.

/botio makeref

@pdfjsbot
Copy link

pdfjsbot commented Oct 8, 2020

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/993ed63c1ba27e9/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Oct 8, 2020

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/993ed63c1ba27e9/output.txt

Total script time: 23.47 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@Snuffleupagus Snuffleupagus deleted the LocalTilingPatternCache branch October 9, 2020 07:11
@Snuffleupagus Snuffleupagus restored the LocalTilingPatternCache branch October 9, 2020 16:18
@Snuffleupagus
Copy link
Collaborator Author

/botio-windows makeref

@pdfjsbot
Copy link

pdfjsbot commented Oct 9, 2020

From: Bot.io (Windows)


Received

Command cmd_makeref from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/e5b6a91e38c4152/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Oct 9, 2020

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/e5b6a91e38c4152/output.txt

Total script time: 26.08 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@Snuffleupagus Snuffleupagus deleted the LocalTilingPatternCache branch October 9, 2020 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants