Improve the `Page.content` and `Page.getContentStream` methods #13375

Snuffleupagus · 2021-05-14T09:51:29Z

First of all, by using Dict.getArray in the Page.content getter we remove the need to manually iterate through and fetch the sub-streams (when they exist) in the Page.getContentStream method.
Secondly, we can simplify the code in Page.{getOperatorList, extractTextContent} by letting Page.getContentStream ensure that content is available and returning a Promise instead.

Similar to the `get`/`getAsync` methods, this should be a *tiny* bit more efficient which cannot hurt considering that `getArray` is now used a lot more than when initially added.

First of all, by using `Dict.getArray` in the `Page.content` getter we remove the need to manually iterate through and fetch the sub-streams (when they exist) in the `Page.getContentStream` method. Secondly, we can simplify the code in `Page.{getOperatorList, extractTextContent}` by letting `Page.getContentStream` ensure that `content` is available and returning a Promise instead.

Snuffleupagus · 2021-05-14T09:57:24Z

/botio test

pdfjsbot · 2021-05-14T09:57:25Z

From: Bot.io (Linux m4)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/26aa2ab5d36d587/output.txt

pdfjsbot · 2021-05-14T09:57:25Z

From: Bot.io (Windows)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://3.101.106.178:8877/2712bb6b59e2261/output.txt

pdfjsbot · 2021-05-14T10:23:33Z

From: Bot.io (Linux m4)

Failed

Full output at http://54.67.70.0:8877/26aa2ab5d36d587/output.txt

Total script time: 26.11 mins

Font tests: Passed
Unit tests: Passed
Integration Tests: Passed
Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/26aa2ab5d36d587/reftest-analyzer.html#web=eq.log

pdfjsbot · 2021-05-14T10:27:21Z

From: Bot.io (Windows)

Failed

Full output at http://3.101.106.178:8877/2712bb6b59e2261/output.txt

Total script time: 29.91 mins

Font tests: Passed
Unit tests: Passed
Integration Tests: Passed
Regression tests: FAILED

Image differences available at: http://3.101.106.178:8877/2712bb6b59e2261/reftest-analyzer.html#web=eq.log

timvandermeij · 2021-05-14T20:16:43Z

Thanks!

Snuffleupagus added 2 commits May 14, 2021 11:24

Inline the data lookup in the Dict.getArray method

7011313

Similar to the `get`/`getAsync` methods, this should be a *tiny* bit more efficient which cannot hurt considering that `getArray` is now used a lot more than when initially added.

timvandermeij added the core label May 14, 2021

timvandermeij approved these changes May 14, 2021

View reviewed changes

timvandermeij merged commit c9892be into mozilla:master May 14, 2021

Snuffleupagus deleted the refactor-getContentStream branch May 14, 2021 20:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the `Page.content` and `Page.getContentStream` methods #13375

Improve the `Page.content` and `Page.getContentStream` methods #13375

Snuffleupagus commented May 14, 2021

Snuffleupagus commented May 14, 2021

pdfjsbot commented May 14, 2021

pdfjsbot commented May 14, 2021

pdfjsbot commented May 14, 2021

pdfjsbot commented May 14, 2021

timvandermeij commented May 14, 2021

Improve the Page.content and Page.getContentStream methods #13375

Improve the Page.content and Page.getContentStream methods #13375

Conversation

Snuffleupagus commented May 14, 2021

Snuffleupagus commented May 14, 2021

pdfjsbot commented May 14, 2021

From: Bot.io (Linux m4)

Received

pdfjsbot commented May 14, 2021

From: Bot.io (Windows)

Received

pdfjsbot commented May 14, 2021

From: Bot.io (Linux m4)

Failed

pdfjsbot commented May 14, 2021

From: Bot.io (Windows)

Failed

timvandermeij commented May 14, 2021

Improve the `Page.content` and `Page.getContentStream` methods #13375

Improve the `Page.content` and `Page.getContentStream` methods #13375