Skip to content

robh-projects/puppeteer-middlewares

 
 

Repository files navigation

Puppeteer Middlewares

A library that currently implements rule-based Request Middlewares for Puppeteer page events.

You can conditionally proxify, block, retry, or just override request options!

Installation

  • Using NPM

    npm i @teocns/puppeteer-middlewares
  • From source

    git clone https://github.com/teocns/puppeteer-middlewares/
    cd puppeteer-middlewares
    npm install && npm run build
    

Running tests

  • npm run test makes use of Jest to run all tests placed under /test

Request Middleware

⚡ Usage

const puppeteer = require('puppeteer');
const { RequestMiddleware, ConditionRuleMatchType } = require('@teocns/puppeteer-middlewares');

(
    async () => {
        const browser = await puppeteer.launch({
            headless: false,
        });
        const page = await browser.newPage();

        
    
        new RequestMiddleware(
            {
                conditions: { match: 'google.com', type: ConditionRuleMatchType.URL_CONTAINS },
                effects:{
                    proxy: 'http://localhost:8899',
                }
            }
        ).bind(page);


        await page.goto('https://google.com');

        const awaitSleep = (ms: number) => new Promise((resolve) => setTimeout(resolve, ms));
        await awaitSleep(10000);
        await browser.close();
    }
)();

❓ How to build rules

Each rule are dictionary objects made of conditions and effects.

Example
{
  conditions: { match: 'google.com', type: ConditionRuleMatchType.URL_CONTAINS },
  effects: {
      proxy: 'http://localhost:8899',
      setHeaders: {
          'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36'
      }
  }
}

Available Condition types

ENTIRE_URL
URL_REGEXP
URL_CONTAINS
RESPONSE_STATUS_CODE

Logical Operators

  • Conditions implement the logical operator property whose values can be either of "OR" | "AND" | "NOR" | "XOR" | "NAND" | "NXOR" | "XNOR".

  • The default operator is OR.

  • ❗Note: you must use a valid javascript RegExp string

Usage

Matches everything starting with https:// and ending with either .com or .net

conditions: {
    operator: 'OR',
    match: ['^https://.+.com$', '^https://.+.net$'],
    type: ConditionRuleMatchType.URL_REGEXP,
}

Match status codes different than 200

conditions: {
    operator: 'NOR'
    match: 200,
    type: ConditionRuleMatchType.URL_REGEXP,
}

Effects

  • block Will block the request
  • proxy proxifies the request. Provide an URL string
  • setHeaders will override request headers
Examples

Override headers and use a proxy for all requests containing google.com in the URL

{
  conditions: { match: 'google.com', type: ConditionRuleMatchType.URL_CONTAINS },
  effects: {
      proxy: 'http://localhost:8899',
      setHeaders: {
          'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36'
      }
  }
}

Block all requests matching https://undesired.com

{
  conditions: { match: 'https://undesired.com', type: ConditionRuleMatchType.ENTIRE_URL},
  effects: {
      block:true
  }
}

♻️ Retry rules

You can conditionally retry requests with specified effects

Usage

In this example, the flow is:

  • Will retry 3 times with a cheap-proxy, if the status code is 5xx
  • Will retry 1 time with an expensive-proxy
{ 
  retryRule:{
      conditions: { match: '^5\\d{2}$', type: ConditionRuleMatchType.RESPONSE_STATUS_CODE },
      effects:{
          proxy: 'http://cheap-proxy:8899',
      }, 
      retryCount:3,
      retryRule: {
          effects:{
              proxy: 'http://expensive-proxy:8899'
          }
      }
  }
}

About

Control the PageEvent lifecycle using dictionary rules

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 99.8%
  • JavaScript 0.2%