This repository contains scripts that automatically download all available historical data for a given set of countries from openAQ. The collected data is then stored in a Postgres database.
The data of openAQ comes from sensors situated in specific locations spread throughout a country. To download the data directly from openAQ, the user would need to supply each of the location IDs which would be time consuming if the user is interested in collecting all of the data for a certain country.
The scripts will help the user to automatically collect data from all of the sensors located in the countries of interest.
The data downloaded from the openAQ S3 bucket is stored in a narrow format. More details can be found here.
When downloading for multiple countries, the size of data may reach up to millions of rows. To lessen the size, the narrow table is normalized into a snowflake schema.
-
The dimension tables are populated using the openAQ API. This is implement through the script
create_table.py
. -
The fact table is populated by downloading the data directly from the AWS S3 bucket. This is implemented through the script
dl_from_aws.py
.
-
Install Docker
-
Create an openAQ account to get an API key
-
Rename
dev.env
to.env
and fill in the necessary variables.- Run
mv dev.env .env
- The countries of interest will be specified in this file
- Run
-
Run
docker compose build
thendocker compose up -d
Dashboard creation for learning purposes.