Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[api-minor] Refactor the annotation code to be asynchronous #9822

Merged
merged 2 commits into from
Aug 11, 2018

Conversation

timvandermeij
Copy link
Contributor

This commit is the first step towards implementing parsing for the appearance streams of annotations.

Supersedes #9417.

@timvandermeij
Copy link
Contributor Author

/botio-linux preview

@timvandermeij
Copy link
Contributor Author

/botio test

1 similar comment
@timvandermeij
Copy link
Contributor Author

/botio test

@timvandermeij
Copy link
Contributor Author

What gives, Chrome? First it fails on both bots, then only on Linux. Firefox thinks everything is fine.

@yurydelendik @brendandahl Can you see why Chrome hangs, or reboot the bots?

@timvandermeij
Copy link
Contributor Author

/botio test

@timvandermeij
Copy link
Contributor Author

timvandermeij commented Jun 20, 2018

Chrome still failed in the logs :(

@brendandahl
Copy link
Contributor

Hmm..seems to fail in the same spot after bug850854. Is this happening on other branches or just here?

@timvandermeij
Copy link
Contributor Author

It looks like it only happens for this commit and only in Chrome. I'll check if I can add more diagnostic information to the test runner to find out exactly which test fails and then I can hopefully try to reproduce it locally in Chrome. Even if I can't, it will be useful to let the test runner output which test actually caused the timeout for future failures too.

@timvandermeij
Copy link
Contributor Author

/botio-linux test

@timvandermeij
Copy link
Contributor Author

@brendandahl I made the test runner output the name of the failing test:

TEST-UNEXPECTED-FAIL | test wdsg_fitc failed; chrome has not responded in 120s

However, I don't have issues with this test locally. It only seems to happen on the Linux bot, which has gotten almost 10 minutes slower since some time ago and which also shows other weird logs (see the message directly above). Could you perhaps reboot the machine, since I have no idea what else it could be?

@brendandahl
Copy link
Contributor

Running that pdf locally with your patch I get:

"MissingDataException
    at MissingDataExceptionClosure (http://127.0.0.1:58450/build/generic/build/pdf.worker.js:1138:36)
    at Object._typeof (http://127.0.0.1:58450/build/generic/build/pdf.worker.js:1142:2)
    at __w_pdfjs_require__ (http://127.0.0.1:58450/build/generic/build/pdf.worker.js:52:30)
    at Object.defineProperty.value (http://127.0.0.1:58450/build/generic/build/pdf.worker.js:147:13)
    at __w_pdfjs_require__ (http://127.0.0.1:58450/build/generic/build/pdf.worker.js:52:30)
    at Object.defineProperty.value (http://127.0.0.1:58450/build/generic/build/pdf.worker.js:128:23)
    at __w_pdfjs_require__ (http://127.0.0.1:58450/build/generic/build/pdf.worker.js:52:30)
    at pdfjsVersion (http://127.0.0.1:58450/build/generic/build/pdf.worker.js:116:18)
    at http://127.0.0.1:58450/build/generic/build/pdf.worker.js:119:10
    at webpackUniversalModuleDefinition (http://127.0.0.1:58450/build/generic/build/pdf.worker.js:31:60)"

@brendandahl
Copy link
Contributor

Uncaught (in promise) MissingDataException: Missing data [3741030, 3741031)

@dmitryskey
Copy link
Contributor

dmitryskey commented Jul 10, 2018

After debugging I found that the problem happens when PDF.js parses link annotations in this document, the stack is the following:

ChunkedStream_ensureByte (chunked_stream.js:110)
ChunkedStream_getByte (chunked_stream.js:163)
Lexer_nextChar (parser.js:714)
Lexer (parser.js:658)
XRef_fetchUncompressed (obj.js:1414)
XRef_fetch (obj.js:1393)
XRef_fetchIfRef (obj.js:1367)
create (annotation.js:40)
getAnnotations (document.js:316)
getOperatorList (document.js:182)
(anonymous) (worker.js:743)
Promise.then (async)
wphSetupRenderPage (worker.js:736)
MessageHandler._onComObjOnMessage (message_handler.js:128)
Worker.postMessage (async)
postMessage (message_handler.js:435)
send (message_handler.js:158)
PDFPageProxy_render (api.js:944)
paintOnCanvas (pdf_page_view.js:580)
draw (pdf_page_view.js:466)
(anonymous) (acroforms.js:46)
Promise.then (async)
(anonymous) (acroforms.js:33)
Promise.then (async)
(anonymous) (acroforms.js:32)
Promise.then (async)
PDFDocumentLoadingTask_then (api.js:495)
(anonymous) (acroforms.js:27)

It only happens in Chrome, the exception was thrown in the code below, by some reason 'this.loadedChunks' is undefined

ensureByte: function ChunkedStream_ensureByte(pos) {
  var chunk = Math.floor(pos / this.chunkSize);
  if (chunk === this.lastSuccessfulEnsureByteChunk) {
    return;
  }

  if (!this.loadedChunks[chunk]) {
    throw new MissingDataException(pos, pos + 1);
  }
  this.lastSuccessfulEnsureByteChunk = chunk;
},

I attached the document that causes this issue
wdsg_fitc.pdf

My first impression that there is something wrong with link annotation structure in this PDF which was exposed when we started parsing annotations asynchronously.

@dmitryskey
Copy link
Contributor

So after debugging I think I found the explanation. It seems like that the ChunkedStream initiates a lot of workers for every stream chunk with default size from the api.js:37 = 65536 bytes. In particular for this document and the Link annotation there were 65 chunks and therefore 65 workers. Not all of them completed the associated chunk initialization, error happened for the chunk #57. Also if I put a breakpoint in the proper place document was loaded fine, in my understanding I let all workers to be completed.

The solution I tried seems safe and straightforward. I doubled the default chunk size in the api.js:37 as

var DEFAULT_RANGE_CHUNK_SIZE = 2 * 65536;

It automatically reduced the number of workers and Chrome was able to load document without any console errors. Could you try this change and see if it helps?

@brendandahl
Copy link
Contributor

Usually if there's a MissingDataException that's unhandled it means that we have some code path that needs to handle it and retry once there is more data.

See the BasePdfManager's ensure function and the various things that call it.

@dmitryskey
Copy link
Contributor

Yes, it's definitely possible as well, one can add the try/catch block in the document.js:316 and create an annotation promise after the MissingDataException once again.

But is it possible to check also that 128K chunk range buffer helps for this Linux bot in Chrome? The 64K buffer size exists from the very beginning in 2013 and I think it's safe to double that, also it's configurable since 2015. The total amount of memory consumption should be the same but the number of workers will be reduced. I'm not familiar with V8 engine details, but I can imagine that when this patch was added, the additional annotation workers exceeded some limit for the slow Linux bot.

@brendandahl
Copy link
Contributor

I'm not sure I follow what you mean by workers, in pdf.js we only spawn one worker ever. Do you mean HTTP requests?

Increasing the limit just hides the issue, we need to address the real issue of not handling the exception correctly.

@dmitryskey
Copy link
Contributor

dmitryskey commented Jul 11, 2018

Yea, I was talking about worker.js:435 with 'parseStartXRef', my interpretation of this code was that we create a new worker for parsing XRef reference element. Sorry for the misunderstanding.

I think that just handling an exception hides the issue too, this is a definitely reproducible regression for this change.

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Jul 14, 2018

Looking at the patch, there's a couple of (more-or-less obvious) problems with the implementation:

  • First and foremost, when investigating the bug causing the failures in Chrome, you'll notice that this patch doesn't actually make any attempt at loading missing Annotation data (which is evident from the MissingDataExceptions being thrown). Instead, the new code will happily assume that all Annotation data was already loaded, and if/when that's not the case trouble ensues.

    In the old (synchronous) code, loading of Annotation data was done via PDFManager.ensure(), but that appears to have been completely removed in this patch. However, simply adding that back isn't sufficient on its own, now that the code is supposed to be asynchronous, without also making sure that AnnotationFactory.create is able to load any missing Annotation data.

  • Secondly, almost all of the Annotation unit-tests are now completely broken.
    Running the tests (manually) in a browser, via http://localhost:8888/test/unit/unit_test.html, there's now loads of SPEC HAS NO EXPECTATIONS warnings all over the place. (Unfortunately the unit-test framework, i.e. Jasmine, apparently cannot be configured to simply fail in that case.)

    All the unit-tests touched in this patch thus needed to be modified to actually wait for the now asynchronous Annotation code to complete. Furthermore, by chaining Promises, the "should only accept comb fields when the flags are valid" test-case can be restored to slightly saner looking code.


The following patch is an attempt at addressing the most severe (known) issues, as outlined above: https://gist.githubusercontent.com/Snuffleupagus/1856b76ba23fe633e155ddf9037b2db1/raw/9f54c8715ed67c467df1a6e16eac69012c991bc2/0001-Ensure-that-all-Annots-resources-are-actually-loaded.patch

@mozilla mozilla deleted a comment from pdfjsbot Aug 11, 2018
@mozilla mozilla deleted a comment from pdfjsbot Aug 11, 2018
@mozilla mozilla deleted a comment from pdfjsbot Aug 11, 2018
@mozilla mozilla deleted a comment from pdfjsbot Aug 11, 2018
@mozilla mozilla deleted a comment from pdfjsbot Aug 11, 2018
@mozilla mozilla deleted a comment from pdfjsbot Aug 11, 2018
@mozilla mozilla deleted a comment from pdfjsbot Aug 11, 2018
@mozilla mozilla deleted a comment from pdfjsbot Aug 11, 2018
@mozilla mozilla deleted a comment from pdfjsbot Aug 11, 2018
@mozilla mozilla deleted a comment from pdfjsbot Aug 11, 2018
@mozilla mozilla deleted a comment from pdfjsbot Aug 11, 2018
@mozilla mozilla deleted a comment from pdfjsbot Aug 11, 2018
@mozilla mozilla deleted a comment from pdfjsbot Aug 11, 2018
@mozilla mozilla deleted a comment from pdfjsbot Aug 11, 2018
@mozilla mozilla deleted a comment from pdfjsbot Aug 11, 2018
@mozilla mozilla deleted a comment from pdfjsbot Aug 11, 2018
@timvandermeij timvandermeij force-pushed the annotations branch 2 times, most recently from c49764c to 5a88bbe Compare August 11, 2018 16:59
dmitryskey and others added 2 commits August 11, 2018 19:00
This commit is the first step towards implementing parsing for the
appearance streams of annotations.

Co-authored-by: Jonas Jenwald <[email protected]>
Co-authored-by: Tim van der Meij <[email protected]>
@timvandermeij
Copy link
Contributor Author

/botio test

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/155e8ad2330f5ea/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.215.176.217:8877/af0dbe52aa2ff2e/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/af0dbe52aa2ff2e/output.txt

Total script time: 29.27 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/155e8ad2330f5ea/output.txt

Total script time: 36.60 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@timvandermeij
Copy link
Contributor Author

/botio-linux preview

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/7f8a9533a049315/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/7f8a9533a049315/output.txt

Total script time: 7.05 mins

Published

@timvandermeij timvandermeij merged commit af19ed6 into mozilla:master Aug 11, 2018
@timvandermeij timvandermeij deleted the annotations branch August 11, 2018 18:39
@timvandermeij
Copy link
Contributor Author

Thank you @dmitryskey and @Snuffleupagus for making this change possible!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants