Skip to content

Data scrapping and analysis for the Spanish blog finofilipino

Notifications You must be signed in to change notification settings

sgamezrdo/fino-selenium

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 

Repository files navigation

fino-selenium

Data scrapping for the Spanish blog FinoFilipino

Current functionality:

  • Capture main content fields:
    • Title
    • Content(Currently only text / links to img/gif/video)
    • Number of views
    • Number of comments
    • Tags and categories
    • Publish date
  • User comments:
    • Content
    • Comment author
    • Comment publish date
    • Parent / Child hierarchy for answers to comments
  • Automatic interaction with cookies pop-up
  • Logging
  • AWS compatibility (using -a or --aws as command line parameter)

The scrapped data will be used to run an analysis trying to capture the following dimensions:

  • What posts are the most visited ones?
  • What correlation is there between number of comments and views?
  • What kind of content generates more views / comments?
  • What are the most frequent words / topics in the title / content / comments?
  • What users are the ones with the highest interaction?

About

Data scrapping and analysis for the Spanish blog finofilipino

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages