Etsy Web-Scraping

This project strips the content and tags of etsy product webpages, and returns a that information as a csv. The program is currently set up to scrape the first 10 pages of each catagory, this can be changed by adjusting a value in pageScrapper.py.

Workflow

To Run

To run this workflow:

Identify and possibly change necessary filepaths
The current program is hardcoded to use the original directory for the amazon/etsy scraping project.
Create site_url.csv
This file contains the list of urls that you wish to scrape
Run:

categoryScraper.sbatch
pageScraper.sbatch
htmlStripper.sbatch

This will collect the pages in each category, HTML for each product page, and process the HTML into a dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
artifacts		artifacts
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Etsy Web-Scraping

Workflow

To Run

About

Releases

Packages

Contributors 3

Languages

SouthernMethodistUniversity/etsy_scraping

Folders and files

Latest commit

History

Repository files navigation

Etsy Web-Scraping

Workflow

To Run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages