Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to cancel or destroy a getPage request with disableAutoFetch set #11453

Closed
arelaxend opened this issue Dec 27, 2019 · 9 comments
Closed

How to cancel or destroy a getPage request with disableAutoFetch set #11453

arelaxend opened this issue Dec 27, 2019 · 9 comments

Comments

@arelaxend
Copy link

arelaxend commented Dec 27, 2019

Dear pdf.js contributors,

With disableAutoFetch set, is there a way to cancel fetching on getPage() ? The same way one destroy() getDocument promise

It looks like it is possible but I found only internal functions.

Best, A.

@Snuffleupagus
Copy link
Collaborator

With disableAutoFetch set, is there a way to cancel fetching on getPage() ?

Huh, calling getPage is what causes data to be requested (there's no cancelling involved); it's quite frankly difficult to understand what you're trying to ask here.

@arelaxend
Copy link
Author

arelaxend commented Dec 27, 2019

it's quite frankly difficult to understand what you're trying to ask here.

Oups. With disableAutoFetch off and disableStreaming off, whenever one calls getDocument starts fetching the entire file. One can cancel the fetching by calling destroy() on the promise, it is going to stop the GET request.

const task = pdfjs.getDocument(...);
...
if (task !== undefined) {
  await task.destroy();
  delete task;
}

With disableAutoFetch set, fetching occurs just after getPage, but there is no destroy() to cancel the GET 206 range request in case one wants to. For example, one requires to cancel some pages currently being fetched because the user moves to other pages before the previous pages were fetched.

Still, it looks like it is possible to cancel the _transport, but if one does that it is going to cancel all future requests.

pdf.js/src/display/api.js

Lines 423 to 424 in c3a1c67

const transportDestroyed = !this._transport ? Promise.resolve() :
this._transport.destroy();

calling getPage is what causes data to be requested (there's no cancelling involved)

Absolutely, the best way is not to call getPage if one should not. Still, this is not the point here 💯 and it is also better not to call getDocument if one should not.

Is the following a workaround ?

cancelAllRequests(reason) {
if (this._fullRequestReader) {
this._fullRequestReader.cancel(reason);
}
const readers = this._rangeReaders.slice(0);
readers.forEach(function(rangeReader) {
rangeReader.cancel(reason);
});
this._pdfDataRangeTransport.abort();
}

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Dec 27, 2019

[...] but there is no destroy() to cancel the GET 206 request in case one wants to.

There's no way of doing what you're asking, short of destroying the loadingTask itself (and thus closing the entire document).

For example, one requires to cancel some pages currently being fetched because the user moves to other pages before the previous pages were fetched.

First of all, note that there's a couple of different ways that data could be loaded (using Fetch, XMLHttpRequest, or a PDFDataRangeTransport implementation). Secondly, there's generally speaking nothing that says that different pages wouldn't need data from the same byte range (and aborting a request could thus break other getPage calls).

Hence what you're asking for isn't possible, nor will it be supported either unfortunately (as outlined above, and the use-case seems fairly specialized anyway).

@arelaxend
Copy link
Author

arelaxend commented Dec 27, 2019

Secondly, there's generally speaking nothing that says that different pages wouldn't need data from the same byte range (and aborting a request could thus break other getPage calls).

Ok. In my use case, I fetch say page [-1, current, 1] whenever the user moves to a page. If a user moves fast to another current page, I am going to cancelAllRequests().

cancelAllRequests(reason) {
if (this._fullRequestReader) {
this._fullRequestReader.cancel(reason);
}
const readers = this._rangeReaders.slice(0);
readers.forEach(function(rangeReader) {
rangeReader.cancel(reason);
});
this._pdfDataRangeTransport.abort();

Wait until all the requests are cancelled, and fetch the new [-1, current, 1] pages.
My question is: does cancelAllRequests() the best option for such scenario ?

First of all, note that there's a couple of different ways that data could be loaded (using Fetch, XMLHttpRequest, or a PDFDataRangeTransport implementation).

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Dec 27, 2019

I am going to cancelAllRequests().

As explained in #11453 (comment) that will easily lead to all kinds of breakage, and isn't something that you should be calling manually (it's being used from WorkerTransport.destroy).

I am currently using PDFDataRangeTransport implementation for range requests.

Please note that the default range request functionality in PDF.js isn't in any way connected with PDFDataRangeTransport, so unless you're using the API along the lines below then you're not actually using PDFDataRangeTransport.

const loadingTask = getDocument({
  range: /* custom PDFDataRangeTransport here */,
  //  more parameters here
});

@arelaxend
Copy link
Author

arelaxend commented Dec 27, 2019

OK. I am going to setTimeout(() => getPage(), 200); and clearTimeout() the timeouts in ref. to your first comment

calling getPage is what causes data to be requested (there's no cancelling involved)

What is the purpose of PDFDataRangeTransport ? Extending the range capabilities ? I found no examples or use cases out there

Thank you for all your tips @Snuffleupagus 👍

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Dec 27, 2019

What is the purpose of PDFDataRangeTransport ?

It allows completely custom data delivery, that you thus can implement in what ever way you want/need in your case (it's being used in the PDF Viewer that's built-in to the Firefox browser).

While it does allow a great deal of flexibility, it's consequently a fair bit more complex than just providing a URL when calling getDocument :-)

I found no examples or use cases out there

There's the API unit-tests and also the default viewer usages here, here, here and finally here and here.

@timvandermeij
Copy link
Contributor

Closing as answered by the comments above.

@legistek
Copy link

I had a similar issue with very large non-linearized PDFs where getPage was taking a very long time for later pages because most if not the whole PDF had to be downloaded. Specifically if the user closed the document but not the app it would nonetheless keep downloading. For this PDFDocumentLoadingTask.destroy works well, thank you for the advice @Snuffleupagus!

I would quickly point out that it would be nice to be able to cancel an individual page load too though because, of course, page loading can sometimes take WAY longer than rendering. Hunting around the code it seems like during the while loop here it'd be pretty straightforward to add a simple cancellation check on each iteration, using something akin to an optional CancellationToken that could be passed into getPage and getPageDict. Would that not do the trick and would you entertain a PR that did that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants