fino-selenium

Data scrapping for the Spanish blog FinoFilipino

Current functionality:

Capture main content fields:
- Title
- Content(Currently only text / links to img/gif/video)
- Number of views
- Number of comments
- Tags and categories
- Publish date
User comments:
- Content
- Comment author
- Comment publish date
- Parent / Child hierarchy for answers to comments
Automatic interaction with cookies pop-up
Logging
AWS compatibility (using -a or --aws as command line parameter)

The scrapped data will be used to run an analysis trying to capture the following dimensions:

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback