Skip to content
This repository has been archived by the owner on Jan 17, 2023. It is now read-only.

roryf/session-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Session Scraper Build Status

npm info

A super simple web scraper that mimicks a browsing session by saving cookies from previous requests.

Install

$ npm install session-scraper

Usage

var Scraper = require('session-scraper');

var scraper = new Scraper();
scraper.get('https://github.com/roryf?tab=repositories').then(function($) {
  var repoUrl = $('.repolist li:first-child h3 a').attr('href');
  var name = $('.repolist li:first-child h3 a').text();
  console.log('Fetching readme for ' + name);
  scraper.get('http://github.com' + repoUrl).then(function($) {
    console.log($('.entry-content').text());
  });
});

Options

Scraper supports options to customise it's behaviour.

userAgent

Provide a custom User-Agent header string. Ommiting this will use random user agent string.

logLevel

Level of logging. All output is written to console.log. Levels:

  • VERBOSE log everything
  • SILENT log nothing

output

Save all requests to a given relative directory. This will save one file per-request and allows scraper to use input option to load pre-scraped fixture data.

input

Load relative directory of fixture data representing requests. Scraper will only provide responses from this data.

About

Super simple node web scraper

Resources

License

Stars

Watchers

Forks

Packages

No packages published