This is project was used to create ELT pipeline to maintain supply chains in airflow. You can use the dags created inside the dag folder to understand how to implement a full and incremental data load, how to connect to postgres, how to insert data from a csv file by creating a pandas dataframe and how to retrieve data from the postgres server.
Please ensure that the below are installed:
- Docker
- Docker Compose
- Postgres
- Postgres Server, Database and a Table
You can pull the Airflow Docker Image as per the instructions available in the Apache Airflow documentation -> https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html
Here we are tackling a full load of data and a incremental load of data.
Here we are loading an entire dataset into the database. For example when you have first created the pipeline you might want to add previous data into the server. The .py file named as Full_load.py performs a full_load.
Here we are loading only a certain portion of a dataset into the database. For example, when you have some data collection over time after an inital load, you will only upload the data that is newly added. The .py file named as Incremental_load.py performs a incremental load.
The first screenshot is the final execution of the incremental_load, you can see that final run is successful and the event logs confirming that the single inserts by the dataframe's rows are complete. Consequently data has being loaded into the postgres table as shown in the next screenshot.