Etsy Web-Scraping

This project strips the content and tags of etsy product webpages, and returns a that information as a csv. The program is currently set up to scrape the first 10 pages of each catagory, this can be changed by adjusting a value in pageScrapper.py.

Workflow

To Run

To run this workflow:

Identify and possibly change necessary filepaths
The current program is hardcoded to use the original directory for the amazon/etsy scraping project.
Create site_url.csv
This file contains the list of urls that you wish to scrape
Run:

categoryScraper.sbatch
pageScraper.sbatch
htmlStripper.sbatch

This will collect the pages in each category, HTML for each product page, and process the HTML into a dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Etsy Web-Scraping

Workflow

To Run

Files

README.md

Latest commit

History

README.md

File metadata and controls

Etsy Web-Scraping

Workflow

To Run