A small memory-usage improvement for PDF documents opened from TypedArray-data #14968

Snuffleupagus · 2022-05-29T14:44:45Z

This patch contains a small optimization specifically for the case when getDocument is called with TypedArray-data. In that case we'll still hold onto that data, which could obviously be large, even after the "GetDocRequest"-message has been sent to the worker-thread.

In practice this will most likely not affect memory usage in any noticeable way, since the application calling getDocument will probably also be keeping a reference to the TypedArray-data. However, it seems like a good idea to ensure that the PDF.js API itself won't unnecessarily keep this data alive.

…rray-data This patch contains a small optimization specifically for the case when `getDocument` is called with TypedArray-data. In that case we'll still hold onto that data, which could obviously be large, even after the "GetDocRequest"-message has been sent to the worker-thread. In practice this will most likely not affect memory usage in any noticeable way, since the application calling `getDocument` will probably also be keeping a reference to the TypedArray-data. However, it seems like a good idea to ensure that the PDF.js API *itself* won't unnecessarily keep this data alive.

Snuffleupagus · 2022-05-29T14:46:27Z

src/display/api.js

+   * @returns {Promise<Uint8Array>} A promise that is resolved with a
+   *   {Uint8Array} that has the raw data from the PDF.


While unrelated to the rest of the patch, this method should always be returning a Uint8Array hence it cannot hurt to be more specific in the JSDocs.

Snuffleupagus · 2022-05-29T14:48:57Z

/botio unittest

pdfjsbot · 2022-05-29T14:48:58Z

From: Bot.io (Linux m4)

Received

Command cmd_unittest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/61fa8e8c496ff6f/output.txt

pdfjsbot · 2022-05-29T14:48:59Z

From: Bot.io (Windows)

Received

Command cmd_unittest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.193.163.58:8877/c9b91cc4d3b6bdd/output.txt

pdfjsbot · 2022-05-29T14:52:17Z

From: Bot.io (Linux m4)

Failed

Full output at http://54.241.84.105:8877/61fa8e8c496ff6f/output.txt

Total script time: 3.30 mins

Unit Tests: FAILED

pdfjsbot · 2022-05-29T14:56:10Z

From: Bot.io (Windows)

Success

Full output at http://54.193.163.58:8877/c9b91cc4d3b6bdd/output.txt

Total script time: 7.17 mins

Unit Tests: Passed

calixteman · 2022-05-29T15:53:00Z

src/display/api.js

+  if (source.data) {
+    source.data = null;
+  }
+


Could it make sense to transfer the data in the worker ?

I though about that, but decided against doing it since that may end up breaking someone's code. Given that can't know how third-party users are invoking getDocument, it's thus possible that a user want to do something with the TypedArray after calling getDocument and it'd be surprising (and a breaking API-change) if we just transferred this data.

(And note that none of this applies to the Firefox PDF viewer either.)

calixteman

LGTM

Snuffleupagus added the core label May 29, 2022

Snuffleupagus commented May 29, 2022

View reviewed changes

calixteman reviewed May 29, 2022

View reviewed changes

calixteman approved these changes May 29, 2022

View reviewed changes

Snuffleupagus merged commit 1ac33c9 into mozilla:master May 29, 2022

Snuffleupagus deleted the api-release-data branch May 29, 2022 16:35

ZeroXClem mentioned this pull request Aug 12, 2024

[Snyk] Upgrade pdfjs-dist from 2.9.359 to 2.16.105 ZeroXClem/metamesa#3

Closed

earthywh mentioned this pull request Sep 24, 2024

[Snyk] Upgrade pdfjs-dist from 2.6.347 to 2.16.105 earthywh/filestash#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A small memory-usage improvement for PDF documents opened from TypedArray-data #14968

A small memory-usage improvement for PDF documents opened from TypedArray-data #14968

Snuffleupagus commented May 29, 2022

Snuffleupagus May 29, 2022

Snuffleupagus commented May 29, 2022

pdfjsbot commented May 29, 2022

pdfjsbot commented May 29, 2022

pdfjsbot commented May 29, 2022

pdfjsbot commented May 29, 2022

calixteman May 29, 2022

Snuffleupagus May 29, 2022 •

edited

Loading

calixteman left a comment

		* @returns {Promise<Uint8Array>} A promise that is resolved with a
		* {Uint8Array} that has the raw data from the PDF.

A small memory-usage improvement for PDF documents opened from TypedArray-data #14968

A small memory-usage improvement for PDF documents opened from TypedArray-data #14968

Conversation

Snuffleupagus commented May 29, 2022

Snuffleupagus May 29, 2022

Choose a reason for hiding this comment

Snuffleupagus commented May 29, 2022

pdfjsbot commented May 29, 2022

From: Bot.io (Linux m4)

Received

pdfjsbot commented May 29, 2022

From: Bot.io (Windows)

Received

pdfjsbot commented May 29, 2022

From: Bot.io (Linux m4)

Failed

pdfjsbot commented May 29, 2022

From: Bot.io (Windows)

Success

calixteman May 29, 2022

Choose a reason for hiding this comment

Snuffleupagus May 29, 2022 • edited Loading

Choose a reason for hiding this comment

calixteman left a comment

Choose a reason for hiding this comment

Snuffleupagus May 29, 2022 •

edited

Loading