-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A small memory-usage improvement for PDF documents opened from TypedArray-data #14968
Conversation
…rray-data This patch contains a small optimization specifically for the case when `getDocument` is called with TypedArray-data. In that case we'll still hold onto that data, which could obviously be large, even after the "GetDocRequest"-message has been sent to the worker-thread. In practice this will most likely not affect memory usage in any noticeable way, since the application calling `getDocument` will probably also be keeping a reference to the TypedArray-data. However, it seems like a good idea to ensure that the PDF.js API *itself* won't unnecessarily keep this data alive.
* @returns {Promise<Uint8Array>} A promise that is resolved with a | ||
* {Uint8Array} that has the raw data from the PDF. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While unrelated to the rest of the patch, this method should always be returning a Uint8Array
hence it cannot hurt to be more specific in the JSDocs.
/botio unittest |
From: Bot.io (Linux m4)ReceivedCommand cmd_unittest from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/61fa8e8c496ff6f/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_unittest from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.193.163.58:8877/c9b91cc4d3b6bdd/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/61fa8e8c496ff6f/output.txt Total script time: 3.30 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.193.163.58:8877/c9b91cc4d3b6bdd/output.txt Total script time: 7.17 mins
|
if (source.data) { | ||
source.data = null; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it make sense to transfer the data in the worker ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I though about that, but decided against doing it since that may end up breaking someone's code. Given that can't know how third-party users are invoking getDocument
, it's thus possible that a user want to do something with the TypedArray after calling getDocument
and it'd be surprising (and a breaking API-change) if we just transferred this data.
(And note that none of this applies to the Firefox PDF viewer either.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This patch contains a small optimization specifically for the case when
getDocument
is called with TypedArray-data. In that case we'll still hold onto that data, which could obviously be large, even after the "GetDocRequest"-message has been sent to the worker-thread.In practice this will most likely not affect memory usage in any noticeable way, since the application calling
getDocument
will probably also be keeping a reference to the TypedArray-data. However, it seems like a good idea to ensure that the PDF.js API itself won't unnecessarily keep this data alive.