A super simple web scraper that mimicks a browsing session by saving cookies from previous requests.
$ npm install session-scraper
var Scraper = require('session-scraper');
var scraper = new Scraper();
scraper.get('https://github.com/roryf?tab=repositories').then(function($) {
var repoUrl = $('.repolist li:first-child h3 a').attr('href');
var name = $('.repolist li:first-child h3 a').text();
console.log('Fetching readme for ' + name);
scraper.get('http://github.com' + repoUrl).then(function($) {
console.log($('.entry-content').text());
});
});
Scraper supports options to customise it's behaviour.
Provide a custom User-Agent
header string. Ommiting this will use random user agent string.
Level of logging. All output is written to console.log
. Levels:
VERBOSE
log everythingSILENT
log nothing
Save all requests to a given relative directory. This will save one file per-request and allows scraper to use input
option to load pre-scraped fixture data.
Load relative directory of fixture data representing requests. Scraper will only provide responses from this data.