Code to download, process, and analyze Chicago's publicly available taxi and Transportation Network Provider (Uber/Lyft) data. Raw data comes from the City of Chicago:
Used originally in support of this post: https://toddwschneider.com/posts/chicago-taxi-data/. Note that at the time that post was written, TNP data was not yet available.
This repo is something of a companion to the nyc-taxi-data repo. The repos share some similar code and structure, but do not explicitly depend on each other.
As of Q1 2020, the Chicago taxi dataset contains nearly 200 million rows, while the TNP dataset is around 130 million rows.
1. Install PostgreSQL and PostGIS
Both are available via Homebrew on Mac OS X
Note: the raw taxi data is a single uncompressed 70GB+ .csv file, it will take a little while to download!
If you prefer, you can download and process either the taxi or TNP dataset without the other
./initialize_database.sh
./download_raw_taxi_data.sh && ./download_raw_tnp_data.sh
./import_taxi_trip_data.sh && ./import_raw_tnp_data.sh
New taxi data is available monthly; new TNP data quarterly. Once you've run the full setup, in the future you can download and process only the latest data by running
./update_taxi_trips_data.sh
./update_tnp_trips_data.sh
This has the advantage of not downloading the entire datasets every time you want to get the latest data
Within the analysis/
subfolder, prepare_analysis.sql
and analysis.R
scripts to do analysis in Postgres and R
- Chicago includes anonymous taxi medallion IDs, NYC does not
- Chicago includes fare info for TNP trips, NYC's comparable FHV dataset does not
- Chicago does not include information about which TNP provided which trip, NYC does
- Chicago does not include precise location coordinates, only census tracts and community areas (and even then, only sometimes)
- Since July 2016, NYC also does not provide precise coordinates
- Chicago does not include precise timestamps, instead rounds pickups and drop offs to 15-minute intervals
- Chicago daily weather data from the NCDC
- Chicago community area and census tract shapefiles from the City of Chicago
- NYC yellow taxi monthly data from the NYC Taxi & Limousine Commission
- Cubs home schedules from Baseball Reference
[email protected], or open a GitHub issue