[api-minor] Refactor the annotation code to be asynchronous #9822

timvandermeij · 2018-06-17T20:22:54Z

This commit is the first step towards implementing parsing for the appearance streams of annotations.

Supersedes #9417.

timvandermeij · 2018-06-17T20:23:59Z

/botio-linux preview

timvandermeij · 2018-06-17T20:32:22Z

/botio test

timvandermeij · 2018-06-17T21:09:06Z

/botio test

timvandermeij · 2018-06-17T21:45:26Z

What gives, Chrome? First it fails on both bots, then only on Linux. Firefox thinks everything is fine.

@yurydelendik @brendandahl Can you see why Chrome hangs, or reboot the bots?

timvandermeij · 2018-06-19T20:57:12Z

/botio test

timvandermeij · 2018-06-20T20:52:08Z

Chrome still failed in the logs :(

brendandahl · 2018-06-26T23:14:02Z

Hmm..seems to fail in the same spot after bug850854. Is this happening on other branches or just here?

timvandermeij · 2018-06-27T20:36:05Z

It looks like it only happens for this commit and only in Chrome. I'll check if I can add more diagnostic information to the test runner to find out exactly which test fails and then I can hopefully try to reproduce it locally in Chrome. Even if I can't, it will be useful to let the test runner output which test actually caused the timeout for future failures too.

timvandermeij · 2018-07-01T12:33:06Z

/botio-linux test

timvandermeij · 2018-07-01T13:30:22Z

@brendandahl I made the test runner output the name of the failing test:

TEST-UNEXPECTED-FAIL | test wdsg_fitc failed; chrome has not responded in 120s

However, I don't have issues with this test locally. It only seems to happen on the Linux bot, which has gotten almost 10 minutes slower since some time ago and which also shows other weird logs (see the message directly above). Could you perhaps reboot the machine, since I have no idea what else it could be?

brendandahl · 2018-07-06T21:25:37Z

Running that pdf locally with your patch I get:

"MissingDataException
    at MissingDataExceptionClosure (http://127.0.0.1:58450/build/generic/build/pdf.worker.js:1138:36)
    at Object._typeof (http://127.0.0.1:58450/build/generic/build/pdf.worker.js:1142:2)
    at __w_pdfjs_require__ (http://127.0.0.1:58450/build/generic/build/pdf.worker.js:52:30)
    at Object.defineProperty.value (http://127.0.0.1:58450/build/generic/build/pdf.worker.js:147:13)
    at __w_pdfjs_require__ (http://127.0.0.1:58450/build/generic/build/pdf.worker.js:52:30)
    at Object.defineProperty.value (http://127.0.0.1:58450/build/generic/build/pdf.worker.js:128:23)
    at __w_pdfjs_require__ (http://127.0.0.1:58450/build/generic/build/pdf.worker.js:52:30)
    at pdfjsVersion (http://127.0.0.1:58450/build/generic/build/pdf.worker.js:116:18)
    at http://127.0.0.1:58450/build/generic/build/pdf.worker.js:119:10
    at webpackUniversalModuleDefinition (http://127.0.0.1:58450/build/generic/build/pdf.worker.js:31:60)"

brendandahl · 2018-07-06T21:27:10Z

Uncaught (in promise) MissingDataException: Missing data [3741030, 3741031)

dmitryskey · 2018-07-10T02:20:39Z

After debugging I found that the problem happens when PDF.js parses link annotations in this document, the stack is the following:

ChunkedStream_ensureByte (chunked_stream.js:110)
ChunkedStream_getByte (chunked_stream.js:163)
Lexer_nextChar (parser.js:714)
Lexer (parser.js:658)
XRef_fetchUncompressed (obj.js:1414)
XRef_fetch (obj.js:1393)
XRef_fetchIfRef (obj.js:1367)
create (annotation.js:40)
getAnnotations (document.js:316)
getOperatorList (document.js:182)
(anonymous) (worker.js:743)
Promise.then (async)
wphSetupRenderPage (worker.js:736)
MessageHandler._onComObjOnMessage (message_handler.js:128)
Worker.postMessage (async)
postMessage (message_handler.js:435)
send (message_handler.js:158)
PDFPageProxy_render (api.js:944)
paintOnCanvas (pdf_page_view.js:580)
draw (pdf_page_view.js:466)
(anonymous) (acroforms.js:46)
Promise.then (async)
(anonymous) (acroforms.js:33)
Promise.then (async)
(anonymous) (acroforms.js:32)
Promise.then (async)
PDFDocumentLoadingTask_then (api.js:495)
(anonymous) (acroforms.js:27)

It only happens in Chrome, the exception was thrown in the code below, by some reason 'this.loadedChunks' is undefined

ensureByte: function ChunkedStream_ensureByte(pos) {
  var chunk = Math.floor(pos / this.chunkSize);
  if (chunk === this.lastSuccessfulEnsureByteChunk) {
    return;
  }

  if (!this.loadedChunks[chunk]) {
    throw new MissingDataException(pos, pos + 1);
  }
  this.lastSuccessfulEnsureByteChunk = chunk;
},

I attached the document that causes this issue
wdsg_fitc.pdf

My first impression that there is something wrong with link annotation structure in this PDF which was exposed when we started parsing annotations asynchronously.

dmitryskey · 2018-07-11T02:26:16Z

So after debugging I think I found the explanation. It seems like that the ChunkedStream initiates a lot of workers for every stream chunk with default size from the api.js:37 = 65536 bytes. In particular for this document and the Link annotation there were 65 chunks and therefore 65 workers. Not all of them completed the associated chunk initialization, error happened for the chunk #57. Also if I put a breakpoint in the proper place document was loaded fine, in my understanding I let all workers to be completed.

The solution I tried seems safe and straightforward. I doubled the default chunk size in the api.js:37 as

var DEFAULT_RANGE_CHUNK_SIZE = 2 * 65536;

It automatically reduced the number of workers and Chrome was able to load document without any console errors. Could you try this change and see if it helps?

brendandahl · 2018-07-11T16:50:58Z

Usually if there's a MissingDataException that's unhandled it means that we have some code path that needs to handle it and retry once there is more data.

See the BasePdfManager's ensure function and the various things that call it.

dmitryskey · 2018-07-11T17:16:24Z

Yes, it's definitely possible as well, one can add the try/catch block in the document.js:316 and create an annotation promise after the MissingDataException once again.

But is it possible to check also that 128K chunk range buffer helps for this Linux bot in Chrome? The 64K buffer size exists from the very beginning in 2013 and I think it's safe to double that, also it's configurable since 2015. The total amount of memory consumption should be the same but the number of workers will be reduced. I'm not familiar with V8 engine details, but I can imagine that when this patch was added, the additional annotation workers exceeded some limit for the slow Linux bot.

brendandahl · 2018-07-11T18:16:50Z

I'm not sure I follow what you mean by workers, in pdf.js we only spawn one worker ever. Do you mean HTTP requests?

Increasing the limit just hides the issue, we need to address the real issue of not handling the exception correctly.

dmitryskey · 2018-07-11T19:31:54Z

Yea, I was talking about worker.js:435 with 'parseStartXRef', my interpretation of this code was that we create a new worker for parsing XRef reference element. Sorry for the misunderstanding.

I think that just handling an exception hides the issue too, this is a definitely reproducible regression for this change.

Snuffleupagus · 2018-07-14T12:59:38Z

Looking at the patch, there's a couple of (more-or-less obvious) problems with the implementation:

First and foremost, when investigating the bug causing the failures in Chrome, you'll notice that this patch doesn't actually make any attempt at loading missing Annotation data (which is evident from the MissingDataExceptions being thrown). Instead, the new code will happily assume that all Annotation data was already loaded, and if/when that's not the case trouble ensues.

In the old (synchronous) code, loading of Annotation data was done via PDFManager.ensure(), but that appears to have been completely removed in this patch. However, simply adding that back isn't sufficient on its own, now that the code is supposed to be asynchronous, without also making sure that AnnotationFactory.create is able to load any missing Annotation data.
Secondly, almost all of the Annotation unit-tests are now completely broken.
Running the tests (manually) in a browser, via http://localhost:8888/test/unit/unit_test.html, there's now loads of SPEC HAS NO EXPECTATIONS warnings all over the place. (Unfortunately the unit-test framework, i.e. Jasmine, apparently cannot be configured to simply fail in that case.)

All the unit-tests touched in this patch thus needed to be modified to actually wait for the now asynchronous Annotation code to complete. Furthermore, by chaining Promises, the "should only accept comb fields when the flags are valid" test-case can be restored to slightly saner looking code.

The following patch is an attempt at addressing the most severe (known) issues, as outlined above: https://gist.githubusercontent.com/Snuffleupagus/1856b76ba23fe633e155ddf9037b2db1/raw/9f54c8715ed67c467df1a6e16eac69012c991bc2/0001-Ensure-that-all-Annots-resources-are-actually-loaded.patch

This commit is the first step towards implementing parsing for the appearance streams of annotations. Co-authored-by: Jonas Jenwald <[email protected]> Co-authored-by: Tim van der Meij <[email protected]>

timvandermeij · 2018-08-11T17:12:59Z

/botio test

pdfjsbot · 2018-08-11T17:13:00Z

From: Bot.io (Linux m4)

Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/155e8ad2330f5ea/output.txt

pdfjsbot · 2018-08-11T17:13:00Z

From: Bot.io (Windows)

Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.215.176.217:8877/af0dbe52aa2ff2e/output.txt

pdfjsbot · 2018-08-11T17:42:17Z

From: Bot.io (Windows)

Success

Full output at http://54.215.176.217:8877/af0dbe52aa2ff2e/output.txt

Total script time: 29.27 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

pdfjsbot · 2018-08-11T17:49:37Z

From: Bot.io (Linux m4)

Success

Full output at http://54.67.70.0:8877/155e8ad2330f5ea/output.txt

Total script time: 36.60 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

timvandermeij · 2018-08-11T18:15:12Z

/botio-linux preview

pdfjsbot · 2018-08-11T18:15:13Z

From: Bot.io (Linux m4)

Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/7f8a9533a049315/output.txt

pdfjsbot · 2018-08-11T18:22:17Z

From: Bot.io (Linux m4)

Success

Full output at http://54.67.70.0:8877/7f8a9533a049315/output.txt

Total script time: 7.05 mins

Published

Viewer: http://54.67.70.0:8877/7f8a9533a049315/web/viewer.html

timvandermeij · 2018-08-11T18:40:32Z

Thank you @dmitryskey and @Snuffleupagus for making this change possible!

timvandermeij added the annotations label Jun 17, 2018

timvandermeij force-pushed the annotations branch from ccd9587 to ca2b151 Compare July 1, 2018 13:28

timvandermeij force-pushed the annotations branch from ca2b151 to a519054 Compare August 11, 2018 16:51

mozilla deleted a comment from pdfjsbot Aug 11, 2018

timvandermeij force-pushed the annotations branch 2 times, most recently from c49764c to 5a88bbe Compare August 11, 2018 16:59

dmitryskey and others added 2 commits August 11, 2018 19:00

[api-minor] Refactor the annotation code to be asynchronous

3741bec

This commit is the first step towards implementing parsing for the appearance streams of annotations. Co-authored-by: Jonas Jenwald <[email protected]> Co-authored-by: Tim van der Meij <[email protected]>

Convert test/unit/annotation_spec.js to ES6 syntax

bbc769c

timvandermeij force-pushed the annotations branch from 5a88bbe to bbc769c Compare August 11, 2018 17:00

timvandermeij merged commit af19ed6 into mozilla:master Aug 11, 2018

timvandermeij deleted the annotations branch August 11, 2018 18:39

timvandermeij mentioned this pull request Aug 12, 2018

Interactive form (AcroForm) support #7613

Closed

23 tasks

escapewindow mentioned this pull request Jul 18, 2020

Issue 7613, Parse AcroForm Dictionary #8625

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[api-minor] Refactor the annotation code to be asynchronous #9822

[api-minor] Refactor the annotation code to be asynchronous #9822

timvandermeij commented Jun 17, 2018

timvandermeij commented Jun 17, 2018

timvandermeij commented Jun 17, 2018

timvandermeij commented Jun 17, 2018

timvandermeij commented Jun 17, 2018

timvandermeij commented Jun 19, 2018

timvandermeij commented Jun 20, 2018 •

edited

Loading

brendandahl commented Jun 26, 2018

timvandermeij commented Jun 27, 2018

timvandermeij commented Jul 1, 2018

timvandermeij commented Jul 1, 2018

brendandahl commented Jul 6, 2018

brendandahl commented Jul 6, 2018

dmitryskey commented Jul 10, 2018 •

edited

Loading

dmitryskey commented Jul 11, 2018

brendandahl commented Jul 11, 2018

dmitryskey commented Jul 11, 2018

brendandahl commented Jul 11, 2018

dmitryskey commented Jul 11, 2018 •

edited

Loading

Snuffleupagus commented Jul 14, 2018 •

edited

Loading

timvandermeij commented Aug 11, 2018

pdfjsbot commented Aug 11, 2018

pdfjsbot commented Aug 11, 2018

pdfjsbot commented Aug 11, 2018

pdfjsbot commented Aug 11, 2018

timvandermeij commented Aug 11, 2018

pdfjsbot commented Aug 11, 2018

pdfjsbot commented Aug 11, 2018

timvandermeij commented Aug 11, 2018

[api-minor] Refactor the annotation code to be asynchronous #9822

[api-minor] Refactor the annotation code to be asynchronous #9822

Conversation

timvandermeij commented Jun 17, 2018

timvandermeij commented Jun 17, 2018

timvandermeij commented Jun 17, 2018

timvandermeij commented Jun 17, 2018

timvandermeij commented Jun 17, 2018

timvandermeij commented Jun 19, 2018

timvandermeij commented Jun 20, 2018 • edited Loading

brendandahl commented Jun 26, 2018

timvandermeij commented Jun 27, 2018

timvandermeij commented Jul 1, 2018

timvandermeij commented Jul 1, 2018

brendandahl commented Jul 6, 2018

brendandahl commented Jul 6, 2018

dmitryskey commented Jul 10, 2018 • edited Loading

dmitryskey commented Jul 11, 2018

brendandahl commented Jul 11, 2018

dmitryskey commented Jul 11, 2018

brendandahl commented Jul 11, 2018

dmitryskey commented Jul 11, 2018 • edited Loading

Snuffleupagus commented Jul 14, 2018 • edited Loading

timvandermeij commented Aug 11, 2018

pdfjsbot commented Aug 11, 2018

From: Bot.io (Linux m4)

Received

pdfjsbot commented Aug 11, 2018

From: Bot.io (Windows)

Received

pdfjsbot commented Aug 11, 2018

From: Bot.io (Windows)

Success

pdfjsbot commented Aug 11, 2018

From: Bot.io (Linux m4)

Success

timvandermeij commented Aug 11, 2018

pdfjsbot commented Aug 11, 2018

From: Bot.io (Linux m4)

Received

pdfjsbot commented Aug 11, 2018

From: Bot.io (Linux m4)

Success

Published

timvandermeij commented Aug 11, 2018

timvandermeij commented Jun 20, 2018 •

edited

Loading

dmitryskey commented Jul 10, 2018 •

edited

Loading

dmitryskey commented Jul 11, 2018 •

edited

Loading

Snuffleupagus commented Jul 14, 2018 •

edited

Loading