A library that currently implements rule-based Request Middlewares for Puppeteer page events.
You can conditionally proxify, block, retry, or just override request options!
-
npm i @teocns/puppeteer-middlewares
-
git clone https://github.com/teocns/puppeteer-middlewares/ cd puppeteer-middlewares npm install && npm run build
npm run test
makes use of Jest to run all tests placed under/test
const puppeteer = require('puppeteer');
const { RequestMiddleware, ConditionRuleMatchType } = require('@teocns/puppeteer-middlewares');
(
async () => {
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();
new RequestMiddleware(
{
conditions: { match: 'google.com', type: ConditionRuleMatchType.URL_CONTAINS },
effects:{
proxy: 'http://localhost:8899',
}
}
).bind(page);
await page.goto('https://google.com');
const awaitSleep = (ms: number) => new Promise((resolve) => setTimeout(resolve, ms));
await awaitSleep(10000);
await browser.close();
}
)();
Each rule are dictionary objects made of conditions
and effects
.
{
conditions: { match: 'google.com', type: ConditionRuleMatchType.URL_CONTAINS },
effects: {
proxy: 'http://localhost:8899',
setHeaders: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36'
}
}
}
ENTIRE_URL
URL_REGEXP
URL_CONTAINS
RESPONSE_STATUS_CODE
-
Conditions implement the logical
operator
property whose values can be either of"OR" | "AND" | "NOR" | "XOR" | "NAND" | "NXOR" | "XNOR"
. -
The default operator is
OR
. -
❗Note: you must use a valid javascript RegExp string
Matches everything starting with https://
and ending with either .com
or .net
conditions: {
operator: 'OR',
match: ['^https://.+.com$', '^https://.+.net$'],
type: ConditionRuleMatchType.URL_REGEXP,
}
Match status codes different than 200
conditions: {
operator: 'NOR'
match: 200,
type: ConditionRuleMatchType.URL_REGEXP,
}
block
Will block the requestproxy
proxifies the request. Provide an URL stringsetHeaders
will override request headers
Override headers and use a proxy for all requests containing google.com
in the URL
{
conditions: { match: 'google.com', type: ConditionRuleMatchType.URL_CONTAINS },
effects: {
proxy: 'http://localhost:8899',
setHeaders: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36'
}
}
}
Block all requests matching https://undesired.com
{
conditions: { match: 'https://undesired.com', type: ConditionRuleMatchType.ENTIRE_URL},
effects: {
block:true
}
}
You can conditionally retry requests with specified effects
In this example, the flow is:
- Will retry 3 times with a
cheap-proxy
, if the status code is5xx
- Will retry 1 time with an
expensive-proxy
{
retryRule:{
conditions: { match: '^5\\d{2}$', type: ConditionRuleMatchType.RESPONSE_STATUS_CODE },
effects:{
proxy: 'http://cheap-proxy:8899',
},
retryCount:3,
retryRule: {
effects:{
proxy: 'http://expensive-proxy:8899'
}
}
}
}