This tutorial assumes you have some experience with tools such as Python, Bash scripts, Docker, and Terraform. You can run the pipeline locally or jump right to the cloud.
- A Google Cloud Account.
- A Google Service Account with rights to BigQuery, Cloud Storage, and Compute Engine.
- Install Terraform on your machine.
- Google Cloud CLI
If you plan on running the pipeline locally you will need to install docker, and docker compose, on your host machine.
- Clone the repo and cd into the directory
git clone https://github.com/KenImade/real_estate_dashboard.git real_estate_dashboard
cd real_estate_dashboard
- If running on Linux you will need to change directory permissions for the dags, data, scripts, and logs folders to allow airflow access to these folders.
chown -R 777 dags, data, logs, scripts
chmod -R 777 dags data logs scripts
- Create a .env file and set the following
echo ENVIRONMENT=test>>.env
echo AIRFLOW_WEBSERVER_SECRET_KEY=my_secret_key>>.env
echo AIRFLOW_CONN_MY_GCP_CONNECTION='google-cloud-platform://?extra__google_cloud_platform__key_path=/opt/airflow/secrets/key.json'>>.env
echo GOOGLE_APPLICATION_CREDENTIALS_LOCAL=./keys/my-creds.json>>.env
-
Copy the service account json key into the keys directory and name the file my-creds.json
-
Change directory to terraform
cd terraform
-
Delete the files main.tf and variables.tf
-
Rename the files main-for-local-setup.tf to main.tf, and variables-for-local-setup.tf to variables.tf
-
Replace the Google Id in the variables.tf file with your google cloud project ID.
-
Replace the Storage bucket name with a unique name also.
-
Initialise and setup cloud infrastructure
terraform init
terraform plan
terraform apply
- Leave the terraform directory and change directory into the dags folder and open the file real_estate_dag.py in your code editor and replace the variables at the top of the file with the names used in the terraform variables.tf file.
cd ..
cd dags
-
Change directory into dbt_real_estate and open the profiles.yml file. You will need to replace the project value under test with your Google Cloud Project ID.
-
If you have dbt installed locally you can ensure the connection is valid by running dbt debug however you will need to export an environment variable ENVIRONMENT
export ENVIRONMENT=test
dbt debug
- Go up a level into the project directory
cd ..
- Run the below command to bring up the application
docker compose up --build -d
-
Go to localhost:8080 to access the airflow UI
-
Input admin for the username and password
-
Run the dag
- Clone the repo and cd into the directory
git clone https://github.com/KenImade/real_estate_dashboard.git real_estate_dashboard
cd real_estate_dashboard
-
Place your service account json file in the keys folder
-
Change directory into the terraform folder
cd terraform
-
Delete the files main-for-local-setup.tf and variables-for-local-setup.tf
-
Replace the Google Project ID and Storage Bucket names in the variables.tf file
-
You will need to create another service account with permissions for Cloud Storage and BigQuery. Take note of the email address and replace the variable vm_service_account_email in your variables.tf file.
-
Initialise and setup cloud infrastructure
terraform init
terraform plan
terraform apply
- Leave the terraform directory and change directory into the dags folder and open the file real_estate_dag.py in your code editor and replace the variables at the top of the file with the names used in the terraform variables.tf file.
cd ..
cd dags
-
Log into your Google Cloud Console and Locate your VM instance under Compute Engine
-
Login into the VM and verify that the docker container is up and running. It might take a few minutes for the VM to spin up the container.
sudo docker ps
- You will need to ssh into the VM and forward the port to access the Airflow UI.
gcloud compute ssh [INSTANCE_NAME] --zone [ZONE] -- -L 8080:localhost:8080
-
Input admin for the username and password
-
Run the dag