-
Notifications
You must be signed in to change notification settings - Fork 9.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Stop loading of page #3238
Comments
@RebliNk17 there's a await page.evaluate(() => window.stop()); |
Correct me if I'm wrong, but I think that the anyway. it's not working. something like this is partially working: |
Don't know how and why this code now works: It stops loading the page and returns all the data from A few days ago it didn't return any data and throw |
Sorry, not working as I thought. When using 'domcontentloaded' or 'load' I'm not getting all the data from some websites but than Im not getting @aslushnikov Any thought on how to do it? I've tried this: But I'm still missing something... |
@RebliNk17 what do you expect to see when you "stop" loading? If you just want the navigation promise to not hang, I'd implement stopping somehow like this: let stopCallback = null;
const stopPromise = new Promise(x => stopCallback = x);
const navigationPromise = Promise.race([
page.goto(url).catch(e => void e),
stopPromise
]);
// Do something; once you want to "stop" navigation, call `stopCallback`.
stopCallback(); |
@aslushnikov What I want is to receive the website HTML content and HTTP requests from the Currently, if the page did not finish loading a Expected result: Is it clearer now? |
@RebliNk17 I'm still not sure what's not working.
So the following approach should yield the expected result:
await page.goto(url).catch(e => void e), // catch and ignore exception So what's not working? |
Maybe |
That's not loading all the javascript in the page.
When using
this will still hang until what I found to be working is something like this: async goto(url, options = {}) {
......
const pageLoadingStoppedFunc = pageLoadingStopped.bind(this);
let ensureNewDocumentNavigation = false;
let error = await Promise.race([
navigate(this._client, url, referrer),
watcher.timeoutOrTerminationPromise(),
pageLoadingStoppedFunc()
]);
if (!error) {
error = await Promise.race([
watcher.timeoutOrTerminationPromise(),
ensureNewDocumentNavigation ? watcher.newDocumentNavigationPromise() : watcher.sameDocumentNavigationPromise(),
pageLoadingStoppedFunc(),
]);
}
watcher.dispose();
helper.removeEventListeners(eventListeners);
if (error)
throw error;
const request = requests.get(mainFrame._navigationURL);
this._finished = true;
return request ? request.response() : null;
...
/* Not sure if this is the right approch for this function... */
async function pageLoadingStopped() {
const _this = this;
return new Promise(function(resolve, reject) {
const interval = setInterval(() => {
if (_this._stopped || _this._finished) {
clearInterval(interval);
resolve();
}
}, 100);
});
}
}
async stopPageLoading() {
await this._client.send('Page.stopLoading');
this._stopped = true;
} this now waits for page loading to finish or loading to stop and not handing at all. Is it possible to add it to the official |
@aslushnikov Any thought on the code I shared above? |
@RebliNk17 sorry for the delay, I was busy with other stuff.
Can we step back and re-iterate since I still don't understand what's not working. If I understand correctly, there's a website that takes a lot of time to load. We want to constrain wait time to certain amount and get content from the page after this time. Is this correct? If yes, why's the following not working for you? const puppeteer = require('puppeteer');
(async() => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
try {
// Contrain loading time to 30 seconds
await page.goto('https://bestmodelsbrasil.blogspot.co.il', {waitUntil: 'networkidle0', timeout: 30000});
} catch (e) {
}
console.log(await page.content());
await browser.close();
})(); |
Sorry, I did not get any notification about your comment. Your code will work, but sometimes, the timeout might not be a time, it can also depended on different code running in the background, like in my situation. Adding this "stopPageLoading" which exists in the Chromium API, will make it possible... |
@aslushnikov Any thoughts? |
Below is my typescript code sample: page.goto('https://some site slow...');
await new Promise(resolve => setTimeout(resolve, 3000));
await page.evaluate(_ => window.stop());
await browser.close().catch(reason => console.error(reason)); There is actually no unhandled promise rejection from the |
Wouldn't the solution in #3238 (comment) work in this case? |
No, it wouldn't work... Again, this is something that exists in the Chromium API... all that is needed it the implementation in puppeteer... |
Can you explain why?
It's very important to understand what we add and why - otherwise we risk to bloat API with one-off solutions. |
This code is not returning the |
Well, that's very easy to address: let stopCallback = null;
const stopPromise = new Promise(x => stopCallback = x);
const navigationPromise = Promise.race([
page.goto(url).catch(e => void e),
stopPromise,
]).then(() => page);
// Do something; once you want to "stop" navigation, call `stopCallback`.
stopCallback(); |
When trying to use |
@aslushnikov any update? |
@RebliNk17 There's no I guess what you want is a navigation response, not the page object. let stopCallback = null;
const stopPromise = new Promise(x => stopCallback = x);
let navigationRequest = null;
const onRequest = r => {
if (r.isNavigationRequest())
navigationRequest = r;
};
page.on('request', onRequest);
const navigationPromise = Promise.race([
page.goto(url).catch(e => void e),
stopPromise,
]).then(() => {
page.removeListener('request', onRequest);
return navigationRequest.response();
});
// Do something; once you want to "stop" navigation, call `stopCallback`.
stopCallback(); |
Sorry, status of page respone. It is exists... |
I'll close this for now - let me know if we can be helpful. |
Hi, Sorry for the delay, I've been on a vacation... I've tested your code, it's not what I need... |
@aslushnikov I am running into some difficulties here too. I think part of the problem is just that there are a lot of different ways a page load can fail. I'm building an archiving tool, and I'd like to give If the page is partially or mostly loaded, but the browser gets stuck "Connecting..." to the server for a resource in the main rendering path, then Currently I am getting the best results from the combination of the One improvement in Puppeteer would be to add a |
Same issue here, am in need of a "page.stopLoading()" |
They don't care, |
@nylen can you please help we understand what the problem is? Our previous discussion on the subject with @RebliNk17 has stalled. You suggest adding a new method:
What would the method do? Why doesn't the workaround work for you? |
@aslushnikov I think The As said above, I'd also suggest that |
@nylen that's a good point. I think the biggest problem I have with Could it be that you rely on some specific behavior or |
My specific use case: I was building a web archiving tool that (ideally) should work with arbitrary pages, and I found there are certain kinds of navigation timeouts that can be avoided or shortened, like when a page is stuck I agree there are other things that could cause navigations after a page is "stopped". I am assuming that "aborting all current in-flight requests" is good enough for my use case, and so far it seems to be working. For this part So I am mostly just looking for potential ways to improve the code of puppeteer users here. Hence the suggestion to make I don't think any of this is particularly urgent. Thanks for all of your work on Puppeteer. |
@aslushnikov Thank you for this
I am using pyppeteer. and had the same problem(couldn't think of a way to get dom and cookies after a timeout). This solved my problem. I can access the DOM with.
and cookies by
I don't understand what everyone else is complaining about. Again, Thank you soo much. Saved me a couple of hours. |
I tried your solution and it is working good but when i try to take screenshot i'm getting error error: Error: Protocol error (Page.captureScreenshot): Unable to capture screenshot
at Promise (/node_modules/puppeteer/lib/Connection.js:183:56)
at new Promise (<anonymous>)
at CDPSession.send (/node_modules/puppeteer/lib/Connection.js:182:12)
at Page._screenshotTask (/node_modules/puppeteer/lib/Page.js:951:39)
at process._tickCallback (internal/process/next_tick.js:68:7)
-- ASYNC --
at Page.<anonymous> (/node_modules/puppeteer/lib/helper.js:111:15)
at htmlBrowser (/dist/apps/botminds-browser/main.js:1079:45)
at process._tickCallback (internal/process/next_tick.js:68:7) |
Stop page loading and/or something else, this can also close the await page.keyboard.press('Escape') If it doesn't work, then duplicate the line several times await page.keyboard.press('Escape')
await page.keyboard.press('Escape')
await page.keyboard.press('Escape') |
Can some one help me to stop this page of continuously loading ?
Not working, puppeteer just stuck. |
This worked for me:
Thanks! |
@aslushnikov The problem is when setting a timeout with Whenever time is out and we reach the timeout, it would be great or even awesome to have a way to stop the page immediately. I agree stopping the ongoing requests won't help much most likely but that'd be better than nothing. Here's an example:
You can see |
Our product crawls our customer's website as part of our overall solution. We are using Puppeteer for this and, overall, it works great. But we have the same problem discussed here. We can't know a priori what the appropriate timeout behavior needs to be on any given page or site. When page.goto throws a TimeoutError, it doesn't necessarily mean that the page is unusable -- but after catching the error we can't access the HttpResponse that is returned when there is no exception. If a new method, page.response(), for example, returned the response object if it is available, we'd be happy. I realize that in some timeout scenarios the response will not be available (such as if the timeout is at the network layer). It may also be a good idea for Puppeteer to emulate a "stop" when it throws an error, but I don't see that I need to be part of that. So something like the following would be desireable:
|
I discovered there is no way to cancel `page.goto` unless you take care about it. Long issue with several workarounds: puppeteer/puppeteer#3238
I discovered there is no way to cancel `page.goto` unless you take care about it. Long issue with several workarounds: puppeteer/puppeteer#3238
* fix(goto): abort page.goto after timeout I discovered there is no way to cancel `page.goto` unless you take care about it. Long issue with several workarounds: puppeteer/puppeteer#3238 * v10.5.0-alpha.0
for some reason this works for me: await page
.goto(url, { waitUntil: "domcontentloaded", timeout: 3000 })
.catch((e) => void e);
await new Promise((resolve) => setTimeout(resolve, 3000));
await page.evaluate((_) => window.stop()); thanks for @chigix |
I used .preventDefault() for this:
|
In Chrome, there is an option to cancel loading of a page by clicking the X which is replaced by the refresh button when the page is loading.
There are some websites that keep on loading, even after 90s I keep on getting timeout errors.
If there was an option to stop loading the page (like there is in chrome), I would get the content that was already loaded and prevent from puppeteer to throw timeout.
I tried to used
page.keyboard.press('Escape');
but with no luck..Another solution would be to stop loading the page after X ms with something like that:
page.setPageLimitLoadingTime(30000);
which will stop the page from continuing the loading process and return all the data it already got...
Chromium API reference:
https://chromedevtools.github.io/devtools-protocol/tot/Page#method-stopLoading
Tell us about your environment:
website with more the 90s loading time
Thank you.
** if there already is an option for my proposal I'm sorry, I just couldn't find anything...
The text was updated successfully, but these errors were encountered: