-
Notifications
You must be signed in to change notification settings - Fork 9.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cooperative request intercepts #6735
Conversation
There's no reason to pull these breaking changes when a valid workaround already exists, as you referenced ( #3853 ). |
@whalderman Thanks for looking at this! #3853 doesn't solve the problems I mentioned above, it only defers one handler's intercept resolution until all synchronous handlers have run. But you might be right that the plugin community could agree to install ONE handler that does the same thing my PR proposes. Let me take a shot at that and get back to you. |
I have published a community package (https://www.npmjs.com/package/enchant-puppeteer) that monkey-patches IMHO this ought to be a core feature. It can be done with no breaking changes and would significantly improve interception capabilities. I would like to hear an official thumbs-up/down from a Puppeteer maintainer before closing this issue. @whalderman forgive me if you are maintainer, I didn't see your name in the contributor list. |
I +1 on @benallfree his comment, this is a crucial feature that I have mentioned as well already in #6567 |
@benallfree thanks for your efforts here and for kicking off the discussion. I've read through your PR and I think I follow, but what I'm struggling with in my head is fully understanding a situation where the limitation of having one request handler causes problems. Is there a small code example that you could share? I just don't think that's landed in my head yet so I'm struggling to form an opinion and would like to make sure I understand fully - thanks. |
@jackfranklin Thanks for writing back and for giving this feature honest consideration 👍 Please consider this use case of two request handlers needing to work cooperatively: The package @cliqz/adblocker contains a Puppeteer binding that intercepts Puppeteer requests and conditionally blocks them by calling AdBlocker makes for a nice plug-in, except using it precludes any other request intercept handlers from further manipulating the request. Once Suppose I wanted to use AdBlocker, but I also wanted to further intercept requests not blocked by AdBlocker and return an alternate response stored in Redis. This is my real-world use case and how I found this problem. const page = await browser.newPage()
page.setInterception(true)
// Initialize ad blocker
// Internally, this registers a handler with page.on('request') and calls a req.abort() or req.continue()
initializeAdBocker(page)
// Now I would like to add my own caching intercept that that either calls respond()
// or continue(). The problem is, by the time my handler runs, AdBlocker has already
// sealed our fate. There is no way to call respond() from here.
//
// Moreover, because this is an async function, the continue() will not
// be called until the next event loop. Any synchronous handler will "win" this race.
//
// By adding a second `true` to Page.setInterception(true,true), we can enable
// Cooperative Intercept Mode and this use case will work with no other changes.
page.on('request', async req=>{
const cachedResponse = await lookUpCachedResponseInRedis(req.url())
if(cachedResponse) {
req.respond(cachedResponse)
return
}
req.continue()
})
await page.goto('...') The modifications I propose in this PR create no breaking changes. Legacy behavior is preserved by default, and the new "Cooperative Intercept Mode" can be enabled for those who want to use it. In cooperative mode, it possible for any number of handlers to call |
@benallfree thank you, that's made it super clear and I think I understand the problem now. I like the fact we could land such a change without needing to introduce breaking changes to the API. @mathiasbynens what do you think about this? |
@jackfranklin Just checking back, thank you 👍 |
@jschfflr and @sadym-chromium, could you PTAL? |
I also see a lot of value in this approach to handling request interception. Especially in the context of third party code that might be intercepting requests too. |
@jschfflr Thanks for the feedback! I have considered the same things, great discussion. Synchronous handlers run in series because the On the other hand, there could be significant performance penalties associated with serializing async handlers. We also don't currently have a way of prioritizing or introspecting Well-written handlers shouldn't assume anything about what other handlers might be doing, or how state may have changed during an Overall, I'm leaning toward allowing parallel, non-deterministic execution, but the express community may have gotten this one right. Let me know what you think! 👍 |
@jschfflr Any further thoughts? I've been thinking more about your idea of deterministic execution, I might have an idea. Do you think it's worthwhile pursuing? |
I've been thinking a lot about this problem because I'm not sure how to handle the situation best. But I came up with the following requirements:
Having the requests be handled in a deterministic order sounds like the best first step in that direction. From the top of my head we could maybe work with an api that looks like the following: const page = await browser.newPage();
// Adding the first interception handler will automatically enable interception
// Request handlers will be run in the order they have been registered
// request.continue() will call the next request handler
// request.abort() and request.respond() will prevent further request handlers
// from being called
page.addRequestInterceptionHandler((request) => {
// Do something
request.abort()
})
page.addRequestInterceptionHandler((request) => {
// Do something
request.respond()
})
page.on('request', (request) => {
// Deprecate abort and respond in the on handler
// But keep current behaviour for now
// Event handlers will be triggered after the interception handlers
// even if abort or respond have already been called to make sure
// this is backwards compatible
})
await page.goto('...') Let me know what you think! |
@sadym-chromium Was this closed in error? |
I'm terribly sorry! I closed it by mistake. |
@benallfree Whohooo, this change is finally merged 🎉 🎉 🎉 |
Fantastic!!! 🎉🎉🥳 |
Thank you so much for your contribution! |
@Androbin I’m the main author of this PR, can you please provide a little more explanation? Do you mean that this feature introduces a regression even when it is not enabled? |
Tagging |
Hello, I wanted share my fork with the Puppeteer team to see if there is anything here you'd like to bring into the core.
This PR introduces request intercept handlers that play well with others. I needed this for a recent project and thought it might help to share my fork because it seems that there is a lot of community confusion around the way request intercepts are handled.
The Problem: Puppeteer's core expects only one
page.on('request')
handler to callabort/respond/continue
Puppeteer anticipates only one request intercept handler (
page.on('request', ...)
) to performcontinue()
,abort()
, orrespond()
. There are valid use cases where multiple request handlers may callabort/respond/continue
, unaware of each other. For example,puppeteer-extra
attempts a plugin architecture, yet enabling any plugin that utilizes request intercepts (such as ad blocker) means that I cannot further alter request interceptions because callingabort/respond/continue
a second time will throw an error.The current core design makes it impossible for multiple request handlers to cooperatively determine what to do with a request.
Here is a sample of issues for background:
berstend/puppeteer-extra#364
berstend/puppeteer-extra#156
argos-ci/jest-puppeteer#308
#5334
#3853 (comment)
#2687
#4524
The solution: Cooperative Request Interception mode
This PR proposes a cooperative interception strategy which allows multiple request intercept handlers to call
abort/respond/continue
in any order and multiplicity, while also makingNetworkManager
wait for async handlers to be fulfilled before finalizing the request interception.Breaking changes
None. Legacy intercept mode is still the default. A new "Cooperative Intercept Mode" is introduced by a second
Page.setRequestInterception
parameter.Todo