Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A small memory-usage improvement for PDF documents opened from TypedArray-data #14968

Merged
merged 1 commit into from
May 29, 2022

Conversation

Snuffleupagus
Copy link
Collaborator

This patch contains a small optimization specifically for the case when getDocument is called with TypedArray-data. In that case we'll still hold onto that data, which could obviously be large, even after the "GetDocRequest"-message has been sent to the worker-thread.

In practice this will most likely not affect memory usage in any noticeable way, since the application calling getDocument will probably also be keeping a reference to the TypedArray-data. However, it seems like a good idea to ensure that the PDF.js API itself won't unnecessarily keep this data alive.

…rray-data

This patch contains a small optimization specifically for the case when `getDocument` is called with TypedArray-data. In that case we'll still hold onto that data, which could obviously be large, even after the "GetDocRequest"-message has been sent to the worker-thread.

In practice this will most likely not affect memory usage in any noticeable way, since the application calling `getDocument` will probably also be keeping a reference to the TypedArray-data. However, it seems like a good idea to ensure that the PDF.js API *itself* won't unnecessarily keep this data alive.
Comment on lines +962 to +963
* @returns {Promise<Uint8Array>} A promise that is resolved with a
* {Uint8Array} that has the raw data from the PDF.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While unrelated to the rest of the patch, this method should always be returning a Uint8Array hence it cannot hurt to be more specific in the JSDocs.

@Snuffleupagus
Copy link
Collaborator Author

/botio unittest

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_unittest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/61fa8e8c496ff6f/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_unittest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.193.163.58:8877/c9b91cc4d3b6bdd/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/61fa8e8c496ff6f/output.txt

Total script time: 3.30 mins

  • Unit Tests: FAILED

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.193.163.58:8877/c9b91cc4d3b6bdd/output.txt

Total script time: 7.17 mins

  • Unit Tests: Passed

if (source.data) {
source.data = null;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it make sense to transfer the data in the worker ?

Copy link
Collaborator Author

@Snuffleupagus Snuffleupagus May 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though about that, but decided against doing it since that may end up breaking someone's code. Given that can't know how third-party users are invoking getDocument, it's thus possible that a user want to do something with the TypedArray after calling getDocument and it'd be surprising (and a breaking API-change) if we just transferred this data.

(And note that none of this applies to the Firefox PDF viewer either.)

Copy link
Contributor

@calixteman calixteman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants