Here's the general outline of the project:
- Gather data sources from online
- Design the data model for the database and data warehouse
- Load the raw data files into cloud storage (AWS S3)
- Preprocess the data and load it into a structured relational database. (AWS RDS)
- Automate the process of Extracting, Transforming, and Loading (ETL) the data into a data warehouse
- Transformations are done to clean the data, improve the data quality, and restructure the tables for the DW for analytics
- Write unit tests to ensure the data pipeline behaves properly and reliably.