-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[api-minor] Refactor the annotation code to be asynchronous #9822
Conversation
/botio-linux preview |
/botio test |
1 similar comment
/botio test |
What gives, Chrome? First it fails on both bots, then only on Linux. Firefox thinks everything is fine. @yurydelendik @brendandahl Can you see why Chrome hangs, or reboot the bots? |
/botio test |
Chrome still failed in the logs :( |
Hmm..seems to fail in the same spot after bug850854. Is this happening on other branches or just here? |
It looks like it only happens for this commit and only in Chrome. I'll check if I can add more diagnostic information to the test runner to find out exactly which test fails and then I can hopefully try to reproduce it locally in Chrome. Even if I can't, it will be useful to let the test runner output which test actually caused the timeout for future failures too. |
/botio-linux test |
ccd9587
to
ca2b151
Compare
@brendandahl I made the test runner output the name of the failing test:
However, I don't have issues with this test locally. It only seems to happen on the Linux bot, which has gotten almost 10 minutes slower since some time ago and which also shows other weird logs (see the message directly above). Could you perhaps reboot the machine, since I have no idea what else it could be? |
Running that pdf locally with your patch I get:
|
Uncaught (in promise) MissingDataException: Missing data [3741030, 3741031) |
After debugging I found that the problem happens when PDF.js parses link annotations in this document, the stack is the following: ChunkedStream_ensureByte (chunked_stream.js:110) It only happens in Chrome, the exception was thrown in the code below, by some reason 'this.loadedChunks' is undefined
I attached the document that causes this issue My first impression that there is something wrong with link annotation structure in this PDF which was exposed when we started parsing annotations asynchronously. |
So after debugging I think I found the explanation. It seems like that the ChunkedStream initiates a lot of workers for every stream chunk with default size from the api.js:37 = 65536 bytes. In particular for this document and the Link annotation there were 65 chunks and therefore 65 workers. Not all of them completed the associated chunk initialization, error happened for the chunk #57. Also if I put a breakpoint in the proper place document was loaded fine, in my understanding I let all workers to be completed. The solution I tried seems safe and straightforward. I doubled the default chunk size in the api.js:37 as var DEFAULT_RANGE_CHUNK_SIZE = 2 * 65536; It automatically reduced the number of workers and Chrome was able to load document without any console errors. Could you try this change and see if it helps? |
Usually if there's a MissingDataException that's unhandled it means that we have some code path that needs to handle it and retry once there is more data. See the BasePdfManager's ensure function and the various things that call it. |
Yes, it's definitely possible as well, one can add the try/catch block in the document.js:316 and create an annotation promise after the MissingDataException once again. But is it possible to check also that 128K chunk range buffer helps for this Linux bot in Chrome? The 64K buffer size exists from the very beginning in 2013 and I think it's safe to double that, also it's configurable since 2015. The total amount of memory consumption should be the same but the number of workers will be reduced. I'm not familiar with V8 engine details, but I can imagine that when this patch was added, the additional annotation workers exceeded some limit for the slow Linux bot. |
I'm not sure I follow what you mean by workers, in pdf.js we only spawn one worker ever. Do you mean HTTP requests? Increasing the limit just hides the issue, we need to address the real issue of not handling the exception correctly. |
Yea, I was talking about worker.js:435 with 'parseStartXRef', my interpretation of this code was that we create a new worker for parsing XRef reference element. Sorry for the misunderstanding. I think that just handling an exception hides the issue too, this is a definitely reproducible regression for this change. |
Looking at the patch, there's a couple of (more-or-less obvious) problems with the implementation:
The following patch is an attempt at addressing the most severe (known) issues, as outlined above: https://gist.githubusercontent.com/Snuffleupagus/1856b76ba23fe633e155ddf9037b2db1/raw/9f54c8715ed67c467df1a6e16eac69012c991bc2/0001-Ensure-that-all-Annots-resources-are-actually-loaded.patch |
ca2b151
to
a519054
Compare
c49764c
to
5a88bbe
Compare
This commit is the first step towards implementing parsing for the appearance streams of annotations. Co-authored-by: Jonas Jenwald <[email protected]> Co-authored-by: Tim van der Meij <[email protected]>
5a88bbe
to
bbc769c
Compare
/botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/155e8ad2330f5ea/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://54.215.176.217:8877/af0dbe52aa2ff2e/output.txt |
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/af0dbe52aa2ff2e/output.txt Total script time: 29.27 mins
|
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/155e8ad2330f5ea/output.txt Total script time: 36.60 mins
|
/botio-linux preview |
From: Bot.io (Linux m4)ReceivedCommand cmd_preview from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/7f8a9533a049315/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/7f8a9533a049315/output.txt Total script time: 7.05 mins Published |
Thank you @dmitryskey and @Snuffleupagus for making this change possible! |
This commit is the first step towards implementing parsing for the appearance streams of annotations.
Supersedes #9417.