Skip to content

paulsp94/TripadvisorCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tripadvisor Crawler

This is a placeholder, later the data retrieved by the crawler will be presented here. The data will be published after anonymisation and aggregation.

The crawler started multiple headless chrome instances over puppeteer to render the Tripadvisor website. Then puppeteer extracted the interesting data and the crawler packaged them in either a restaurant, review, or user object. This object then got handed over to the database handler which was implemented with mongoose. The database handler then wrote the objects into MongoDB.

Crawler data flow

About

A crawler for Tripadvisor restaurants and their respective reviews

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published