This project applies data modeling skills with Postgres and to build an ETL pipeline using Python. Fact and dimension tables are defined for a star schema for a particular analytic focus, and an ETL pipeline is writen that transfers data from files in two local directories into these tables in Postgres using Python and SQL.
Link: Data_Modeling_with_Postgres
This project uses NoSQL data modeling skills with Apache Cassandra to complete and ETL pipline using Python. Data is modeled by creating tables in Apache Cassandra to run queries. An ETL pipline is created and used to transfer data from a set of CSV files within a directory to create a streamline CSV file to model and insert data into Apache Cassandra Tables.
Link: Data_Modeling_with_Apache_Cassandra
This project constructs a data warehouse using data modeling skills to build an ETL pipeline that extracts data from S3, stages it in Redshift, and transforms data into a set of dimensional tables for an analytics team to continue finding insights in what songs their users are listening to.
Link: Data_Warehouse_with_Amazon_Redshift
This project uses big data skills with Spark and data lakes to build an ETL pipeline for a data lake hosted on S3. Data is loaded from S3, then processesd into analytics tables using Spark, and loaded back into S3. The Spark process is deployed on a EC2 cluster using AWS.
Link: Data_Lake_with_Spark