Skip to content

Projects and Exercises for Udacity Data Engineering Nano Degree

Notifications You must be signed in to change notification settings

AyersAuthentic/Data_Engineering

Repository files navigation

Data Engineering Projects

Project 1: Data Modeling with Postgres

This project applies data modeling skills with Postgres and to build an ETL pipeline using Python. Fact and dimension tables are defined for a star schema for a particular analytic focus, and an ETL pipeline is writen that transfers data from files in two local directories into these tables in Postgres using Python and SQL.

Link: Data_Modeling_with_Postgres

Project 2: Data Modeling with Apache Cassandra

This project uses NoSQL data modeling skills with Apache Cassandra to complete and ETL pipline using Python. Data is modeled by creating tables in Apache Cassandra to run queries. An ETL pipline is created and used to transfer data from a set of CSV files within a directory to create a streamline CSV file to model and insert data into Apache Cassandra Tables.

Link: Data_Modeling_with_Apache_Cassandra

Project 3: Data Warehouse with Amazon Redshift

This project constructs a data warehouse using data modeling skills to build an ETL pipeline that extracts data from S3, stages it in Redshift, and transforms data into a set of dimensional tables for an analytics team to continue finding insights in what songs their users are listening to.

Link: Data_Warehouse_with_Amazon_Redshift

Project 4: Data Lake with Spark on AWS

This project uses big data skills with Spark and data lakes to build an ETL pipeline for a data lake hosted on S3. Data is loaded from S3, then processesd into analytics tables using Spark, and loaded back into S3. The Spark process is deployed on a EC2 cluster using AWS.

Link: Data_Lake_with_Spark

Releases

No releases published

Packages

No packages published