Data Engineering Projects

Project 1: Data Modeling with Postgres

This project applies data modeling skills with Postgres and to build an ETL pipeline using Python. Fact and dimension tables are defined for a star schema for a particular analytic focus, and an ETL pipeline is writen that transfers data from files in two local directories into these tables in Postgres using Python and SQL.

Link: Data_Modeling_with_Postgres

Project 2: Data Modeling with Apache Cassandra

This project uses NoSQL data modeling skills with Apache Cassandra to complete and ETL pipline using Python. Data is modeled by creating tables in Apache Cassandra to run queries. An ETL pipline is created and used to transfer data from a set of CSV files within a directory to create a streamline CSV file to model and insert data into Apache Cassandra Tables.

Link: Data_Modeling_with_Apache_Cassandra

Project 3: Data Warehouse with Amazon Redshift

This project constructs a data warehouse using data modeling skills to build an ETL pipeline that extracts data from S3, stages it in Redshift, and transforms data into a set of dimensional tables for an analytics team to continue finding insights in what songs their users are listening to.

Link: Data_Warehouse_with_Amazon_Redshift

Project 4: Data Lake with Spark on AWS

This project uses big data skills with Spark and data lakes to build an ETL pipeline for a data lake hosted on S3. Data is loaded from S3, then processesd into analytics tables using Spark, and loaded back into S3. The Spark process is deployed on a EC2 cluster using AWS.

Link: Data_Lake_with_Spark

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Data_Lake_with_Spark		Data_Lake_with_Spark
Data_Modeling_Apache_Cassandra		Data_Modeling_Apache_Cassandra
Data_Modeling_with_Postgres		Data_Modeling_with_Postgres
Data_Warehouse_with_AWS_Redshift		Data_Warehouse_with_AWS_Redshift
images		images
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering Projects

Project 1: Data Modeling with Postgres

Project 2: Data Modeling with Apache Cassandra

Project 3: Data Warehouse with Amazon Redshift

Project 4: Data Lake with Spark on AWS

About

Releases

Packages

Languages

AyersAuthentic/Data_Engineering

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Projects

Project 1: Data Modeling with Postgres

Project 2: Data Modeling with Apache Cassandra

Project 3: Data Warehouse with Amazon Redshift

Project 4: Data Lake with Spark on AWS

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages