Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[api-major] Remove the SINGLE_FILE build target and the PDFJS.disableWorker option #9385

Merged
merged 4 commits into from
Feb 2, 2018
Merged

[api-major] Remove the SINGLE_FILE build target and the PDFJS.disableWorker option #9385

merged 4 commits into from
Feb 2, 2018

Conversation

Snuffleupagus
Copy link
Collaborator

@Snuffleupagus Snuffleupagus commented Jan 20, 2018

Please refer to the commit messages for additional details.

This PR is an initial attempt at implementing https://mozilla.logbot.info/pdfjs/20180102#c14068811-c14068833; feedback welcome!

  • Update the examples. Given the performance implications of loading the worker file in the main-thread, do we actually want to mention this in the examples or should we simply not advertise it at all?

Copy link
Contributor

@timvandermeij timvandermeij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch looks really good! Aside from the two minor comments below, I think that we should indeed not advertise disabling workers in the examples.

web/app.js Outdated
};
(document.getElementsByTagName('head')[0] || document.body).
appendChild(script);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this bracket is misindented.

Copy link
Collaborator Author

@Snuffleupagus Snuffleupagus Jan 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, too much copy-pasting of existing code here (and above as well); thank you!

web/app.js Outdated
script.onload = function() {
resolve();
};
script.onerror = function () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove space after function to be consistent with the one above.

Copy link
Member

@Rob--W Rob--W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question, how am I supposed to test PDF.js without workers in the Chrome extension build target?

web/app.js Outdated
script.onerror = function() {
reject(new Error(`Cannot load fake worker at: ${script.src}`));
};
(document.getElementsByTagName('head')[0] || document.body).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified to document.head || document.body. Only use document.head is only undefined in IE8-, and supported by all other browsers: https://developer.mozilla.org/en-US/docs/Web/API/Document/head

I usually do document.head || document.documentElement (since <head> should exist; if it does not, then document.body may also be empty, but the root element should always exist, so that's a safe fallback).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just copying the existing pattern used in https://github.com/mozilla/pdf.js/blob/master/web/app.js#L1515-L1534; should we simply change both them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do that in a separate commit (to keep the refactoring separate from the actual feature).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually do document.head || document.documentElement

This suggestion has now been implemented (only in the newly added code); thank you!

web/chromecom.js Outdated
@@ -65,7 +64,9 @@ let ChromeCom = {
file = file.replace(/^drive:/i,
'filesystem:' + location.origin + '/external/');

if (/^filesystem:/.test(file) && !PDFJS.disableWorker) {
const disableWorker = (window.pdfjsDistBuildPdfWorker &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under which circumstances is window.pdfjsDistBuildPdfWorker unset? When the worker script is loaded in the main thread, right? Is it possible for the code to reach this place before the worker script has had a chance to initiate?

In either case, please add a brief comment here to explain what this line is doing..

Copy link
Collaborator Author

@Snuffleupagus Snuffleupagus Jan 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under which circumstances is window.pdfjsDistBuildPdfWorker unset? When the worker script is loaded in the main thread, right?

It's unset by default, i.e. when workers are being used. Hence it should only ever be defined when the pdf.worker.js file is loaded on the main-thread.

Is it possible for the code to reach this place before the worker script has had a chance to initiate?

That shouldn't happen, since the resolvePDFFile function is being called from

pdf.js/web/chromecom.js

Lines 353 to 355 in fe5102a

ChromeExternalServices.initPassiveLoading = function(callbacks) {
let { appConfig, overlayManager, } = PDFViewerApplication;
ChromeCom.resolvePDFFile(appConfig.defaultUrl, overlayManager,

which in turn is called from

pdf.js/web/app.js

Lines 591 to 596 in fe5102a

initPassiveLoading() {
if (typeof PDFJSDev === 'undefined' ||
!PDFJSDev.test('FIREFOX || MOZCENTRAL || CHROME')) {
throw new Error('Not implemented: initPassiveLoading');
}
this.externalServices.initPassiveLoading({

which is called from

pdf.js/web/app.js

Lines 1652 to 1654 in fe5102a

webViewerOpenFileViaURL = function webViewerOpenFileViaURL(file) {
PDFViewerApplication.setTitleUsingUrl(file);
PDFViewerApplication.initPassiveLoading();

webViewerOpenFileViaURL is then called from

pdf.js/web/app.js

Line 1536 in fe5102a

function webViewerInitialized() {

which is only after the viewer initialization code has run, see

pdf.js/web/app.js

Lines 502 to 504 in fe5102a

run(config) {
this.initialize(config).then(webViewerInitialized);
},

Finally, note that we always wait on the hash parameters (among other things) to be initialized in PDFViewerApplication.initialize

pdf.js/web/app.js

Lines 166 to 172 in fe5102a

return this._readPreferences().then(() => {
return this._parseHashParameters();
}).then(() => {
return this._initializeL10n();
}).then(() => {
return this._initializeViewerComponents();
}).then(() => {


In either case, please add a brief comment here to explain what this line is doing..

Sure, will do.

Edit: Actually, do we still need this code-path?
This code seem to have been originally added in PR #4598, but the referenced upstream bug http://crbug.com/362061 seem to have been fixed close to 4 years ago.
I'd thus assume that the affected versions of Chrome are no longer supported since some time, so can we (in a separate PR) remove this code-path and thus not have to worry about it here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to not break backcompat unless necessary. I should take a look at my telemetry (https://github.com/Rob--W/pdfjs-telemetry) to see whether it is viable to rip out code that supports old Chrome versions (there is lots of other code besides this line).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment has now been added here.

@Snuffleupagus
Copy link
Collaborator Author

Snuffleupagus commented Jan 21, 2018

how am I supposed to test PDF.js without workers in the Chrome extension build target?

I don't know exactly how you're currently testing this configuration, so the following might be too simple, but wouldn't it be possible to do something like this:
Set the pdfBugEnabled preference, and then append #disableWorker=true to the viewer (or PDF) URL.

@Rob--W
Copy link
Member

Rob--W commented Jan 21, 2018

Set the pdfBugEnabled preference, and then append #disableWorker=true to the viewer (or PDF) URL.

I used to use the #disableWorker=true parameter, but when that was not supported by default I just put a breakpoint at the end of the library and then set PDFJS.disableWorker=true . Your suggestion looks a bit neater, thanks for that.

@mozilla mozilla deleted a comment from pdfjsbot Jan 21, 2018
@mozilla mozilla deleted a comment from pdfjsbot Jan 21, 2018
@Snuffleupagus
Copy link
Collaborator Author

@yurydelendik Since this PR attempts to implement the idea you proposed at https://mozilla.logbot.info/pdfjs/20180102#c14068811-c14068833, do you have time to review (or provide feedback) here?

@wojtekmaj
Copy link
Contributor

I'm an author of React-PDF and I'm concerned about this change.

How does it impact users using Webpack, who don't want to use worker? Previously, I just set up disableWorker=true option and pdf.js requested pdf.worker.js via require() and created a fake worker. How that procedure will change? Will I need to require the worker file myself to disable worker?

@Rob--W
Copy link
Member

Rob--W commented Jan 24, 2018

@wojtekmaj Your current use case of PDFJS.disableWorker has a bad impact on performance/UX. You traded performance (something that is important to users) for ease of code distribution (which is only convenient for developers).

I guess that it's technically possible to serialize the worker script and then use URL.createObjectURL to load a worker from a single file, but that may be broken by overly aggressive code optimizers (e.g. by having moved a function outside of the scope of the serialized function). It may also break when a strict Content Security Policy is enforced.

You should keep the PDF.js worker in a separate file, and assign the location of it to PDFJS.workerSrc (if it is at a non-standard location):

/**
* Path and filename of the worker file. Required when the worker is enabled
* in development mode. If unspecified in the production build, the worker
* will be loaded based on the location of the pdf.js file. It is recommended
* that the workerSrc is set in a custom application to prevent issues caused
* by third-party frameworks and libraries.
* @var {string}
*/
PDFJS.workerSrc = (PDFJS.workerSrc === undefined ? null : PDFJS.workerSrc);

The logic of worker script resolution is here:

var pdfjsFilePath =
typeof PDFJSDev !== 'undefined' &&
PDFJSDev.test('PRODUCTION && !(MOZCENTRAL || FIREFOX)') &&
typeof document !== 'undefined' && document.currentScript ?
document.currentScript.src : null;

pdf.js/src/display/api.js

Lines 1205 to 1218 in b6c57d9

function getWorkerSrc() {
if (typeof workerSrc !== 'undefined') {
return workerSrc;
}
if (getDefaultSetting('workerSrc')) {
return getDefaultSetting('workerSrc');
}
if (typeof PDFJSDev !== 'undefined' &&
PDFJSDev.test('PRODUCTION && !(MOZCENTRAL || FIREFOX)') &&
pdfjsFilePath) {
return pdfjsFilePath.replace(/(\.(?:min\.)?js)(\?.*)?$/i, '.worker$1$2');
}
throw new Error('No PDFJS.workerSrc specified');
}

@wojtekmaj
Copy link
Contributor

Thanks for information @Rob--W. I actually realize the benefits of service worker here and I discourage my users from using a version without worker, but I do provide them with such ability as for many beginners setting it up is an huge problem. I did struggle a lot with this myself.

Also, sometimes bundle size matters. Currently how PDF.js plays with Webpack is a mess. Sadly, unless provided with additional configuration (which I failed to figure out), Webpack includes pdf.worker.js code twice because of using two separate loaders on them - it's required via worker-loader in webpack.js entry file, and then again without using worker-loader in pdf.js itself. Because of using separate loaders Webpack is not smart enough to dedupe them.

@mozilla mozilla deleted a comment from pdfjsbot Jan 26, 2018
@mozilla mozilla deleted a comment from pdfjsbot Jan 26, 2018
@mozilla mozilla deleted a comment from pdfjsbot Jan 26, 2018
@mozilla mozilla deleted a comment from pdfjsbot Jan 26, 2018
@mozilla mozilla deleted a comment from pdfjsbot Jan 26, 2018
@mozilla mozilla deleted a comment from pdfjsbot Jan 26, 2018
@Rob--W
Copy link
Member

Rob--W commented Jan 27, 2018

The code changes LGTM.

The code path for fake workers seems to not be covered by unit tests. Wouldn't it make sense to add a single test that customized PDFJS.workerSrc and test whether the fake worker can be loaded? Or will we rely on manual tests only for this case?

@Snuffleupagus
Copy link
Collaborator Author

Snuffleupagus commented Jan 28, 2018

The code path for fake workers seems to not be covered by unit tests.

Given that workers are always disabled in Node.js, we should have implicit fake worker test-coverage thanks to the (subset of) API unit-tests running on Travis:

// node.js - disable worker and set require.ensure.
isWorkerDisabled = true;

Slightly off topic for this PR, but we should see if some of the currently pending API tests could be enabled on Node.js/Travis since we now have node_stream.js available.

Wouldn't it make sense to add a single test that customized PDFJS.workerSrc and test whether the fake worker can be loaded?

While we might be able to add a Node.js/Travis-only explicit test for that, I'm not sure that we'd be able to easily test this when the unit-tests run normally (i.e. in browser) without risk of negatively impacting subsequent tests.
The reason being that once we've setup the fake workers, we'll no longer attempt to use real workers any more; refer to

pdf.js/src/display/api.js

Lines 1436 to 1439 in fd242ad

if (!isWorkerDisabled && !getDefaultSetting('disableWorker')) {
warn('Setting up fake worker.');
isWorkerDisabled = true;
}

and

pdf.js/src/display/api.js

Lines 1323 to 1325 in fd242ad

if ((typeof PDFJSDev === 'undefined' || !PDFJSDev.test('SINGLE_FILE')) &&
!isWorkerDisabled && !getDefaultSetting('disableWorker') &&
typeof Worker !== 'undefined') {

Or will we rely on manual tests only for this case?

Given the above, I'm slightly on the fence about trying to add a fake worker test here! Opinions?

@Rob--W
Copy link
Member

Rob--W commented Jan 28, 2018

The code path for fake workers seems to not be covered by unit tests.

Given that workers are always disabled in Node.js, we should have implicit fake worker test-coverage thanks to the (subset of) API unit-tests running on Travis:

I am referring to the injected script tag in your new loadFakeWorker function from this PR. There is probably no test coverage for that.

Given the above, I'm slightly on the fence about trying to add a fake worker test here! Opinions?

How about a custom parameter to disable workers in the unit test? It doesn't even need to be a PDF.js API, it could also be something that does window.Worker = null.

@Snuffleupagus
Copy link
Collaborator Author

How about a custom parameter to disable workers in the unit test? It doesn't even need to be a PDF.js API, it could also be something that does window.Worker = null.

Sorry, but I really don't understand how this would help. As mentioned above, as soon as PDFWorker._setupFakeWorker has executed once then workers become permanently disabled.

@Rob--W
Copy link
Member

Rob--W commented Jan 29, 2018

How about a custom parameter to disable workers in the unit test? It doesn't even need to be a PDF.js API, it could also be something that does window.Worker = null.

Sorry, but I really don't understand how this would help. As mentioned above, as soon as PDFWorker._setupFakeWorker has executed once then workers become permanently disabled.

When I wrote that, I had in mind that it would not be run by default until someone manually edits the URL and adds something like &disableworker=true. But I guess that since it is already a manual test, one can as well add a breakpoint with a debugger and manually do window.Worker = null for testing. Or enable debugging in the viewer and add &disableworker=true and see if the viewer works. So let's ignore my previous request.

One last thing: The Worker code path uses getWorkerSrc() whereas your newly introduced <script> tag uses PDFJS.workerSrc only. This could potentially result in an unexpected behavioral difference, where the Worker loads successfully, but the non-worker fails. I'll defer to your judgement on whether this should be fixed.

Please note that this build target, and the resulting `build/pdf.combined.js` file, is equivalent to setting the `PDFJS.disableWorker` option to `true` which is a performance footgun.
@Snuffleupagus
Copy link
Collaborator Author

Snuffleupagus commented Jan 29, 2018

The Worker code path uses getWorkerSrc() whereas your newly introduced <script> tag uses PDFJS.workerSrc only. This could potentially result in an unexpected behavioral difference, where the Worker loads successfully, but the non-worker fails.

That's a very good catch, thanks for spotting this!
I agree that this inconsistency is bad, so I've attempted to address this by adding a new method on PDFWorker that allows (indirect) access to the internal getWorkerSrc helper function.

@mozilla mozilla deleted a comment from pdfjsbot Jan 30, 2018
@mozilla mozilla deleted a comment from pdfjsbot Jan 30, 2018
@Snuffleupagus
Copy link
Collaborator Author

/botio-linux preview

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_preview from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/d51f666d03385e3/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/d51f666d03385e3/output.txt

Total script time: 8.15 mins

Published

@@ -1491,6 +1502,14 @@ var PDFWorker = (function PDFWorkerClosure() {
return new PDFWorker(null, port);
};

PDFWorker.getWorkerSrc = function() {
try {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why try-catch? getWorkerSrc is not expected to throw (unless there is no way to determine the worker source.

Copy link
Collaborator Author

@Snuffleupagus Snuffleupagus Jan 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're correct that it shouldn't normally throw, that could still happen given this code.
Hence it seemed nice to try and avoid surprises for an API consumer that use this function, not expecting an Error to be thrown. Are you saying that you think this's a completely unwarranted concern, and that we should just return the internal getWorkerSrc value as-is?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I suggest to not catch the error. workerSrc doesn't make sense in Node.js, and when it does, we have to provide a suitable implementation instead of silently returning nothing.

Copy link
Collaborator Author

@Snuffleupagus Snuffleupagus Jan 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the requested changes; thanks for the review comments!

let workerSrc = PDFWorker.getWorkerSrc();
expect(typeof workerSrc).toEqual('string');
expect(workerSrc).toEqual(PDFJS.workerSrc);
done();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the test is synchronous, remove done(); and done in the function(done){. done is only needed for async tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :-)

…ns the current `workerSrc`

This method returns the currently used `workerSrc`, which thus allows obtaining the fallback `workerSrc` value (e.g. when the option wasn't set by the user).
Despite this patch removing the `disableWorker` option itself, please note that we'll still fallback to loading the worker file(s) on the main-thread when running in environments without proper Web Worker support.

Furthermore it's still possible, even with this patch, to force the use of fake workers by manually loading the necessary file using a `<script>` tag on the main-thread.[1]
That way, the functionality of the now removed `SINGLE_FILE` build target and the resulting `build/pdf.combined.js` file can still be achieved simply by adding e.g. `<script src="build/pdf.worker.js"></script>` to the HTML (obviously with the path adjusted as needed).

Finally note that the `disableWorker` option is a performance footgun, and unfortunately many existing third-party examples actually use it without providing any sort of warning/justification.

---

[1] This approach is used in the default viewer, since certain kind of debugging may be easier if the code is running directly on the main-thread.
…tupFakeWorkerGlobal()` function in the `src/display/api.js` file
@Snuffleupagus
Copy link
Collaborator Author

@yurydelendik Ping; since this PR implements your idea from https://mozilla.logbot.info/pdfjs/20180102#c14068811-c14068833, did you want to look at it before we land this?

@Kerumen
Copy link

Kerumen commented Feb 5, 2018

@Snuffleupagus This PR was released on pdfjs-dist as a patch update (v2.0.305 to v2.0.310).
Unfortunately, it broke the lib react-pdf (wojtekmaj/react-pdf#149) and maybe others. It was tagged as api-major, why did you release this as a patch update?

@Snuffleupagus
Copy link
Collaborator Author

Snuffleupagus commented Feb 5, 2018

Regarding #9385 (comment):

We're currently working towards the release of PDF.js version 2.0; that work is being tracked in https://github.com/mozilla/pdf.js/projects/5. As can be seen, there's still a few outstanding things to address, which is why it's not yet officially released; the official releases are always available at https://github.com/mozilla/pdf.js/releases.

Since we wanted to be able to refactor/cleanup old code, the major version number was thus increased to signal that there's changes that are by design not backwards compatibility.
Hence upgrading from PDF.js version 1.x.y to version 2.x.y cannot be expected to be a drop-in replacement, without suitable adjustments to the code using the PDF.js library.

In closing, the problem seem to be twofold:
First of all, users of pdfjs-dist didn't pin the version number to 1.x.y until PDF.js version 2.0 has been officially released. Secondly, some users (might have) overlooked the significance of the increased major version number.

Finally, the actual breaking change in pdfjs-dist was thus a lot earlier than stated above, more specifically at https://github.com/mozilla/pdfjs-dist/tree/v2.0.87.

@Kerumen
Copy link

Kerumen commented Feb 5, 2018

@Snuffleupagus I don't understand. The breaking change (pdf.combined.js removed) was from 2.0.305 to 2.0.310. I was already using the branch 2.x without problems.

@wojtekmaj
Copy link
Contributor

I think the misunderstanding is that usually on GitHub if something is pre-release it's marked as so, e.g. 2.0.305-beta or something like that. Mozilla just throws pdfjs-dist on npm "as is" with every merge, without e.g. releasing using a tag "next" or similar. So, if you take pdfjs-dist@latest you are actually getting the beta without knowing about it.

@Snuffleupagus
Copy link
Collaborator Author

Snuffleupagus commented Feb 5, 2018

I was already using the branch 2.x without problems.

That can probably be attributed to pure luck, given how you're currently using/integrating PDF.js in your application, rather than anything else.
As mentioned, PDF.js version 2.0 will be a breaking change w.r.t. the API, and shouldn't be considered final until released officially at https://github.com/mozilla/pdf.js/releases.

@Rob--W
Copy link
Member

Rob--W commented Feb 5, 2018

@wojtekmaj Your analysis is correct, we should improve the documentation. I filed #9440 for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants