-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement progressive loading of PDFs #2719
Conversation
Does this supersede PR #1923 (HTTP range request support)? |
That's the goal. We cannot go forward w/ #1923 because it makes synchronous xhr requests, which will not work with the extension. This implementation uses asynchronous xhr requests. |
How does this support search? If someone opens the find bar, does it notify them & then download the rest of the PDF? |
@waddlesplash I suppose it would search the portions of the pdf that have been loaded so far. |
/botio-windows preview |
From: Bot.io (Windows)ReceivedCommand cmd_preview from @mduan received. Current queue size: 0 Live output at: http://107.22.172.223:8877/0d6791a73c05149/output.txt |
From: Bot.io (Windows)FailedFull output at http://107.22.172.223:8877/0d6791a73c05149/output.txt Total script time: 0.14 mins |
This PR is fairly stable now, but is still in need of some real-world testing. I've uploaded a packaged XPI of this PR at: http://dl.dropbox.com/u/5961585/pdf.js.xpi. If anyone has some time, it'd be appreciated if you could try it out and let me know if you encounter any issues or have any feedback. |
Just installed. Will use it normally and I"ll tell you what happens :) |
@waddlesplash: sweet, thanks :) |
I just tried it, and it doesn't work for me. I tried both your XPI file and checking out your repo and building the extension myself, but I just get the following error: Edit: My configuration: Windows 7 Professional SP1 (64-bit), Nightly 22 (Mozilla/5.0 (Windows NT 6.1; WOW64; rv:22.0) Gecko/20130301 Firefox/22.0 ID:20130301030909 CSet: 993d7aff3109) |
@Snuffleupagus I get the same issue on Nightly, but it works for me on Aurora and earlier. Could you try with an earlier version of FF while I try to figure out the issue wtih Nightly? |
@mduan It works in Aurora 21 for me too, and I must say that I'm very impressed by the performance improvement! |
} | ||
|
||
messageHandler.on('RequestDataRange', function transportDataRange(args) { | ||
FirefoxCom.request('requestDataRange', args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move FirefoxCom and window.addEventListener('message') out of api.js? Provide service/facade for this functionality, e.g. PdfDataTransportService with requestDataRange and onDataRange members. That will be useful if we decide implement one for web viewer.
setUserData is fixed by 07491f5, I think can cherry-pick this commit |
Just encountered a wierd bug. When opening a PDF from a file:/// URL (which I've done on plenty of other PDFs with no problems), I got the "displayed incorrectly" message (which I had previously gotten on that PDF, see #2637) but this time, only the first and second pages (p2 is the one with the problem) were rendered. I could click the next/prev page buttons, use the scrollbar, etc. - just blank pages. Not even the spinning progress thingy was there. After about ~5s, the spinning progress GIF appeared and all pages and text selection were rendered normally. The only evidence that it ever happened was that the page previews in the sidebar of the pages I had scrolled past remained blank (white, not black with dashed outline). The other pages in the preview that I had not scrolled past appeared OK. |
@waddlesplash: Do you have a link to the PDF? I don't see it in the issue you referenced. |
Here's a similar one with the same problem as mentioned in #2637: http://www.rosettastone.com/us_assets/documentation/RSV2_UG_Level_1_2_English_%28US%29.pdf It has a different problem as well: on all pages except page 1, there are just two graphics objects (a "+" and a line). No spinning icon indicating loading. |
@waddlesplash: I've fixed the problem with the "+" & line showing. In regards to there being a blank page with no spinning icon while it's rendering, the same thing happens without my progressive loading changes. |
Hm. Then that PDF has different behavior than the other PDF I have on my hard drive, because it works fine with the stable version of pdf.js. |
while Firefox makes range requests, Chrome makes one request for the full pdf and while that downloads the progress does not update. once the full pdf has been loaded another pending request for Range:bytes=0-0 is made and comes from the cache. |
Another related issue #1375? |
@bit: The behaviour I saw from Chrome's PDF viewer is that it first requests the full PDF but cancels that request if it finds the server supports range requests. It then enough of the PDF to determine if it is linearized. If the PDF is linearized, it will fetch the PDF in sequential chunks. Otherwise, it will issue a full request to fetch the PDF. Linearized PDF: http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf |
@Snuffleupagus: This PR will currently continue to download the rest of the PDF in sequential chunks while it's rendering. |
@mduan , sorry my last comment might not have been clear, I was talking about pdf.js in Firefox and pdf.js in Chrome. Both with your patch. Was testing with pdf.js and a larger pdf on the same server, server supports range request. But for some reason pdf.js does not use range requests in Chrome while Firefox works as expected. |
@bit: Is the server public? If so could you share a link? The way that range requests work right now is that a request is made for the full pdf and once the headers for the full request are returned, we issue a range request. If the range request completes successfully, we cancel the full request and start issuing range requests unless the full request is already complete. What I think might be happening is that the server is not returning the headers separately from the body. So when headers are received, the full request is already complete and so range requests are not issued. |
server is nginx, here an example: or the version loading above url into an iframe (https://indiancine.ma/texts/j:Indian%20Cinema%20in%20the%20Time%20of%20Celluloid/150) |
@bit: Can you try with the latest changes? I think I may have fixed the issue. |
From: Bot.io (Windows)SuccessFull output at http://107.22.172.223:8877/0a8868c275ad385/output.txt Total script time: 25.08 mins
|
Implement progressive loading of PDFs
I just wanted to say that I am very impressed by this great patch, @mduan! This looks like a fantastic feature to me. |
What's the status of this ticket? It looks like all the sub tasks have been finished (check boxes are checked) still ticket says it is NOT complete! |
This PR has the status 'merged', so it is indeed done. This has already been implemented for a very long time in PDF.js. Every other issue related to this has been closed, so I see no problems here. |
This implements progressive loading of the PDF using an implementation similar to that described in #1108.
Note: This is still a work in progress, so it's incomplete and a bit messy. It is NOT ready to be merged.
Some of the things that still have to be done are:
Possibly more optimizations specifically for linearized PDFs