Skip to content

SouthernMethodistUniversity/etsy_scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

Etsy Web-Scraping

This project strips the content and tags of etsy product webpages, and returns a that information as a csv. The program is currently set up to scrape the first 10 pages of each catagory, this can be changed by adjusting a value in pageScrapper.py.

Workflow

flowchart

To Run

To run this workflow:

  • Identify and possibly change necessary filepaths
  • The current program is hardcoded to use the original directory for the amazon/etsy scraping project.
  • Create site_url.csv
  • This file contains the list of urls that you wish to scrape
  • Run:
  1. categoryScraper.sbatch
  2. pageScraper.sbatch
  3. htmlStripper.sbatch

This will collect the pages in each category, HTML for each product page, and process the HTML into a dataset.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •