Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[api-minor] Refactor fetching of built-in CMaps to utilize a factory on the display side instead, to allow users of the API to provide a custom CMap loading factory (e.g. for use with Node.js) #8064

Merged
merged 2 commits into from
Feb 17, 2017

Conversation

Snuffleupagus
Copy link
Collaborator

@Snuffleupagus Snuffleupagus commented Feb 12, 2017

Currently the built-in CMap files are loaded in src/core/cmap.js using XMLHttpRequest directly. For some environments that might be a problem, hence this patch refactors that to instead use a factory to load built-in CMaps on the main thread and message the data to the worker thread.

This is inspired by other recent work, e.g. the addition of the CanvasFactory, and to a large extent on the IRC discussion starting at http://logs.glob.uno/?c=mozilla%23pdfjs&s=12+Oct+2016&e=12+Oct+2016#c53010.

Please note: While it certainly may be possible to improve the patch, it does work just fine locally as-is, i.e. the viewer still works and all unit/font/reference tests pass.

@yurydelendik Since this PR attempts to implement your idea from http://logs.glob.uno/?c=mozilla%23pdfjs&s=12+Oct+2016&e=12+Oct+2016#c53018, I'd appreciate if you could provide feedback on the implementation when you've got time.

Edit: Based on this PR, we should e.g. be able to get the CMap unit-tests running on Travis, by providing a custom Node.js factory for the unit-test.

Edit 2: Also fixes #4794, courtesy of the second commit.


This change is Reviewable

@Snuffleupagus
Copy link
Collaborator Author

Snuffleupagus commented Feb 13, 2017

Note that based on this patch, it would also be quite easy to add a simple cache to avoid having to load the same CMap files over and over, i.e. it could help with addressing issue #4794.

Although, I'm not sure if the intent of that issue was that we should somehow cache the parsed CMaps, or if it'd suffice if the raw CMap files are cached. Anyway, I've pushed an additional commit which at least reduces the number of file loads considerably for certain PDF files, without the need for any larger refactoring.

Copy link
Contributor

@yurydelendik yurydelendik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with the stringToBytes change.

if (this.binary && request.response) {
data = new Uint8Array(request.response);
} else if (!this.binary && request.responseText) {
var arr = Array.prototype.map.call(request.responseText,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use stringToBytes here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, fixed now.

@yurydelendik
Copy link
Contributor

yurydelendik commented Feb 14, 2017

I don't think anything changes here, but we also need to be sure that permission restrictions stay the same when worker is loaded from foreign origin.

@yurydelendik
Copy link
Contributor

For future, we can opened an issue to create a separate factory and set of js-files for each map to be loaded cross-origin (e.g. these js files will have binary encoded array which will be decoded when loaded via script tag, the files can be generated during pdfjs-dist build and use any PDF compression, e.g. have it as ASCII85 and DEFLATE'd; maybe we shall change bool 'isBinary' field to enum)

@Snuffleupagus
Copy link
Collaborator Author

Snuffleupagus commented Feb 14, 2017

[...] maybe we shall change bool 'isBinary' field to enum

Good idea, I've attempted to implement this; an interdiff is available at https://gist.github.com/Snuffleupagus/46262b3cd0d63c1e7d52ac97d19ffad3.
Thank you for the review, and please let me know if this is what you had in mind!

…on the `display` side instead, to allow users of the API to provide a custom CMap loading factory (e.g. for use with Node.js)

Currently the built-in CMap files are loaded in `src/core/cmap.js` using `XMLHttpRequest` directly. For some environments that might be a problem, hence this patch refactors that to instead use a factory to load built-in CMaps on the main thread and message the data to the worker thread.

This is inspired by other recent work, e.g. the addition of the `CanvasFactory`, and to a large extent on the IRC discussion starting at http://logs.glob.uno/?c=mozilla%23pdfjs&s=12+Oct+2016&e=12+Oct+2016#c53010.
@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.21.233.14:8877/9972b61cc0c5ce9/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/ffa390df14e6d1a/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/ffa390df14e6d1a/output.txt

Total script time: 21.47 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/9972b61cc0c5ce9/output.txt

Total script time: 25.50 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@Snuffleupagus
Copy link
Collaborator Author

Snuffleupagus commented Feb 17, 2017

@yurydelendik Even though you already approved this PR, would you mind quickly looking over it again before we land it?
Please note that based on your review comments, I've made a couple of changes as outlined in #8064 (comment).

@yurydelendik
Copy link
Contributor

@yurydelendik Even though you already approved this PR, would you mind quickly looking over it again before we land it?

lgtm

@yurydelendik yurydelendik merged commit cfaa621 into mozilla:master Feb 17, 2017
@yurydelendik
Copy link
Contributor

Thank you for the patch.

@timvandermeij
Copy link
Contributor

timvandermeij commented Feb 18, 2017

Nice work! This has made a massive difference for me with loading test/pdfs/mao.pdf. Rendering went down from over 4 seconds to approximately 1 second. It's also visible in the network tab that the CMaps are only requested once.

Snuffleupagus added a commit to Snuffleupagus/pdf.js that referenced this pull request Oct 24, 2024
When the binary CMap format had been added there were also some ideas about *maybe* providing formats, see mozilla#8064 (comment), however that's a decade ago and we still only use binary CMaps.
Hence it now seems reasonable to simplify the relevant code by removing `CMapCompressionType` and instead just use a boolean to indicate the type of the built-in CMaps.
Snuffleupagus added a commit to Snuffleupagus/pdf.js that referenced this pull request Oct 24, 2024
After the binary CMap format had been added there were also some ideas about *maybe* providing formats, see mozilla#8064 (comment), however that was over seven years ago and we still only use binary CMaps.
Hence it now seems reasonable to simplify the relevant code by removing `CMapCompressionType` and instead just use a boolean to indicate the type of the built-in CMaps.
Snuffleupagus added a commit to Snuffleupagus/pdf.js that referenced this pull request Oct 24, 2024
After the binary CMap format had been added there were also some ideas about *maybe* providing formats, see [here](mozilla#8064 (comment)), however that was over seven years ago and we still only use binary CMaps.
Hence it now seems reasonable to simplify the relevant code by removing `CMapCompressionType` and instead just use a boolean to indicate the type of the built-in CMaps.
Snuffleupagus added a commit to Snuffleupagus/pdf.js that referenced this pull request Oct 24, 2024
After the binary CMap format had been added there were also some ideas about *maybe* providing other formats, see [here](mozilla#8064 (comment)), however that was over seven years ago and we still only use binary CMaps.
Hence it now seems reasonable to simplify the relevant code by removing `CMapCompressionType` and instead just use a boolean to indicate the type of the built-in CMaps.
ryzokuken pushed a commit to ryzokuken/pdf.js that referenced this pull request Nov 4, 2024
After the binary CMap format had been added there were also some ideas about *maybe* providing other formats, see [here](mozilla#8064 (comment)), however that was over seven years ago and we still only use binary CMaps.
Hence it now seems reasonable to simplify the relevant code by removing `CMapCompressionType` and instead just use a boolean to indicate the type of the built-in CMaps.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cmap files are loaded multiple times from the server
4 participants