I created this repo to simulate modern data architecture by using the tools below. Also, I have written about this topic on Medium. There is a repo I have created to follow the official DBT training programs.
If you want to clone this repo with the its submodules, you should execute the command below.
git clone --recurse-submodules https://github.com/mebaysan/Modern-Data-Architecture.git
- airbyte folder is a submodule of official Airbyte repo.
- You should get more information by using docs.airbyte.com.
- For our repo, we will use just
docker-compose up -d
then Airbyte will be published onlocalhost:8000
. - We use this tool to load the data that is located in Source folder into the database we will use with our dbt project.
- Source folder is created to provide data for dbt.
- Modern-Data-Architecture-DBT folder is the dbt project for the data provided in Source folder. The data is from Kaggle by Brazilian E-Commerce Public Dataset by Olist.
- I have created a sub module in Superset-Production-Environment folder. Also you can visit its original repo by using here. I have created that repo to make easy the set up progress of Superset.
- You have to override the superset_config.py file to run Superset on your local with the dbt project.
- Also, you should add more
RUN pip install <package>
in Dockerfile to install database connectors if you want to use other databases. - Your Superset database credentials should be different than dbt project's output database.
This is a docker compose file to easily run Postgres & pgAdmin
Also there is a file to easily deploy pgAdmin on real servers: docker-compose-nginx.yml
You can follow the steps below to simulate the tools I mentioned above.
- Download the data from Brazilian E-Commerce Public Dataset by Olist into your local.
- Load the downloaded data into the database you wanted by using Airbyte or script.py.
- Run the dbt transformations by using
dbt run
in Modern-Data-Architecture-DBT folder. - Also, if you want you can add the Modern-Data-Architecture-DBT project's git url into the Airbyte to use your custom transformation in Airbyte. But to use this version, you have to put your dbt configuration file into Airbyte volumes.
- Run Superset and connect the database you transformed the data by using dbt.