Skip to content
This repository has been archived by the owner on Jan 17, 2023. It is now read-only.

Latest commit

 

History

History
49 lines (32 loc) · 1.39 KB

README.md

File metadata and controls

49 lines (32 loc) · 1.39 KB

Session Scraper Build Status

npm info

A super simple web scraper that mimicks a browsing session by saving cookies from previous requests.

Install

$ npm install session-scraper

Usage

var Scraper = require('session-scraper');

var scraper = new Scraper();
scraper.get('https://github.com/roryf?tab=repositories').then(function($) {
  var repoUrl = $('.repolist li:first-child h3 a').attr('href');
  var name = $('.repolist li:first-child h3 a').text();
  console.log('Fetching readme for ' + name);
  scraper.get('http://github.com' + repoUrl).then(function($) {
    console.log($('.entry-content').text());
  });
});

Options

Scraper supports options to customise it's behaviour.

userAgent

Provide a custom User-Agent header string. Ommiting this will use random user agent string.

logLevel

Level of logging. All output is written to console.log. Levels:

  • VERBOSE log everything
  • SILENT log nothing

output

Save all requests to a given relative directory. This will save one file per-request and allows scraper to use input option to load pre-scraped fixture data.

input

Load relative directory of fixture data representing requests. Scraper will only provide responses from this data.