-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve memory usage around the BasePdfManager.docBaseUrl
parameter (PR 7689 follow-up)
#13105
Improve memory usage around the BasePdfManager.docBaseUrl
parameter (PR 7689 follow-up)
#13105
Conversation
From: Bot.io (Windows)ReceivedCommand cmd_unittest from @Snuffleupagus received. Current queue size: 0 Live output at: http://3.101.106.178:8877/09ea6c4c5e0f289/output.txt |
From: Bot.io (Linux m4)ReceivedCommand cmd_unittest from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/afb7e8661eff4a8/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/afb7e8661eff4a8/output.txt Total script time: 3.53 mins
|
From: Bot.io (Windows)SuccessFull output at http://3.101.106.178:8877/09ea6c4c5e0f289/output.txt Total script time: 5.88 mins
|
27f4489
to
7756e93
Compare
From: Bot.io (Linux m4)ReceivedCommand cmd_unittest from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/b2d44514e5627ab/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_unittest from @Snuffleupagus received. Current queue size: 0 Live output at: http://3.101.106.178:8877/7941b2c8a0fedd4/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/b2d44514e5627ab/output.txt Total script time: 3.58 mins
|
From: Bot.io (Windows)SuccessFull output at http://3.101.106.178:8877/7941b2c8a0fedd4/output.txt Total script time: 5.92 mins
|
b8b0ea8
to
6515cff
Compare
From: Bot.io (Windows)ReceivedCommand cmd_unittest from @Snuffleupagus received. Current queue size: 0 Live output at: http://3.101.106.178:8877/6335077df911e03/output.txt |
From: Bot.io (Linux m4)ReceivedCommand cmd_unittest from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/e7eccbdcccfe022/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/e7eccbdcccfe022/output.txt Total script time: 3.60 mins
|
From: Bot.io (Windows)SuccessFull output at http://3.101.106.178:8877/6335077df911e03/output.txt Total script time: 5.19 mins
|
6515cff
to
acc953f
Compare
…s` and into `src/display/display_utils.js` It seems reasonable to place this alongside the *similar* `getFilenameFromUrl` helper function. This way, with the changes in the next patch, we also avoid having to expose the `isDataScheme` function in the API itself and we instead expose `getPdfFilenameFromUrl` in the API (which feels overall more appropriate).
… (PR 7689 follow-up) While there is nothing *outright* wrong with the existing implementation, it can however lead to increased memory usage in one particular case (that I completely overlooked when implementing this): For "data:"-URLs, which by definition contains the entire PDF document and can thus be arbitrarily large, we obviously want to avoid sending, storing, and/or logging the "raw" docBaseUrl in that case. To address this, this patch makes the following changes: - Ignore any non-string in the `docBaseUrl` option passed to `getDocument`, since those are unsupported anyway, already on the main-thread. - Ignore "data:"-URLs in the `docBaseUrl` option passed to `getDocument`, to avoid having to send what could potentially be a *very* long string to the worker-thread. - Parse the `docBaseUrl` option *directly* in the `BasePdfManager`-constructors, on the worker-thread, to avoid having to store the "raw" docBaseUrl in the first place.
acc953f
to
c4c7216
Compare
From: Bot.io (Windows)ReceivedCommand cmd_unittest from @Snuffleupagus received. Current queue size: 0 Live output at: http://3.101.106.178:8877/ea9dc7c0af755bc/output.txt |
From: Bot.io (Linux m4)ReceivedCommand cmd_unittest from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/a95ebfb267277e9/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/a95ebfb267277e9/output.txt Total script time: 3.52 mins
|
From: Bot.io (Windows)SuccessFull output at http://3.101.106.178:8877/ea9dc7c0af755bc/output.txt Total script time: 5.58 mins
|
Thank you for improving this! |
While there is nothing outright wrong with the existing implementation, it can however lead to increased memory usage in one particular case (that I completely overlooked when implementing this):
For "data:"-URLs, which by definition contains the entire PDF document and can thus be arbitrarily large, we obviously want to avoid sending, storing, and/or logging the "raw" docBaseUrl in that case.
To address this, this patch makes the following changes:
Ignore any non-string in the
docBaseUrl
option passed togetDocument
, since those are unsupported anyway, already on the main-thread.Ignore "data:"-URLs in the
docBaseUrl
option passed togetDocument
, to avoid having to send what could potentially be a very long string to the worker-thread.Parse the
docBaseUrl
option directly in theBasePdfManager
-constructors, on the worker-thread, to avoid having to store the "raw" docBaseUrl in the first place.