Skip to content

Introduction to Modern Data Analytics Tools Docker, Airbyte, DBT, Apache Superset with Brazilian Ecommerce Data & Applying RFM in DBT

Notifications You must be signed in to change notification settings

mebaysan/Modern-Data-Architecture

Repository files navigation

Introduction

I created this repo to simulate modern data architecture by using the tools below. Also, I have written about this topic on Medium. There is a repo I have created to follow the official DBT training programs.

If you want to clone this repo with the its submodules, you should execute the command below.

git clone --recurse-submodules https://github.com/mebaysan/Modern-Data-Architecture.git

Image by Author

  • airbyte folder is a submodule of official Airbyte repo.
  • You should get more information by using docs.airbyte.com.
  • For our repo, we will use just docker-compose up -d then Airbyte will be published on localhost:8000.
  • We use this tool to load the data that is located in Source folder into the database we will use with our dbt project.
  • I have created a sub module in Superset-Production-Environment folder. Also you can visit its original repo by using here. I have created that repo to make easy the set up progress of Superset.
  • You have to override the superset_config.py file to run Superset on your local with the dbt project.
  • Also, you should add more RUN pip install <package> in Dockerfile to install database connectors if you want to use other databases.
  • Your Superset database credentials should be different than dbt project's output database.

This is a docker compose file to easily run Postgres & pgAdmin

Also there is a file to easily deploy pgAdmin on real servers: docker-compose-nginx.yml

How to utilize this repo?

You can follow the steps below to simulate the tools I mentioned above.

  1. Download the data from Brazilian E-Commerce Public Dataset by Olist into your local.
  2. Load the downloaded data into the database you wanted by using Airbyte or script.py.
    1. You should use docker-compose up -d in airbyte folder to load the data by using Airbyte.
    2. You should override script.py then execute the script to load the data.
  3. Run the dbt transformations by using dbt run in Modern-Data-Architecture-DBT folder.
  4. Also, if you want you can add the Modern-Data-Architecture-DBT project's git url into the Airbyte to use your custom transformation in Airbyte. But to use this version, you have to put your dbt configuration file into Airbyte volumes.
  5. Run Superset and connect the database you transformed the data by using dbt.

About

Introduction to Modern Data Analytics Tools Docker, Airbyte, DBT, Apache Superset with Brazilian Ecommerce Data & Applying RFM in DBT

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published