adapted from Alex Richards' (@alexrichards) excellent IRE17 class.
He'll also be teaching a repeat web scraping session Sunday!
- How web scraping will make your life easier
- How to do so responsibly
- Using third-party Python packages
- Fetching web pages with Python
- Navigating the HTML in those pages to get data
- Structuring scraped data and writing it to a CSV
- And a couple of tips on shortcuts with HTML tables!
You should have Python on your machine. Type the following in Bash (on Mac OS, you can access it with an Application called Terminal) to check that you have the correct version for the class:
which python3
which should return something like
/Library/Frameworks/Python.framework/Versions/3.5/bin/python3
If not, and you're in the CAR18 class, you should flag down the instructor or a TA. If you're not in the class, download Python3.
If you already have Python 3, you should be able to run the command pip install -r requirements.txt
after downloading this repository to get the packages listed below:
You can always:
- Send Alex a note ([email protected])
- DM Alex on Twitter
- Reach out to Melissa
- open an issue here
Struggling with installation? Try this updated guide for Windows and OS X.
- PyCAR for in-depth Python learning
- CodeAcademy for Python syntax
- Think Python, a popular introductory book whose digital edition is available free online
- The Coursera class Using Python to Access Web Data, for which you may want to take preceding classes in preparation
- How the Internet Works, a PyCon 2013 talk by Jessica McKellar
- How Does The Internet, a zine as informative as it is cute, by Amy Wibowo