Skip to content

Latest commit

 

History

History
34 lines (26 loc) · 1.4 KB

README.md

File metadata and controls

34 lines (26 loc) · 1.4 KB

covid_data_pipeline

Here's the general outline of the project:

  • Gather data sources from online
  • Design the data model for the database and data warehouse
  • Load the raw data files into cloud storage (AWS S3)
  • Preprocess the data and load it into a structured relational database. (AWS RDS)
  • Automate the process of Extracting, Transforming, and Loading (ETL) the data into a data warehouse
    • Transformations are done to clean the data, improve the data quality, and restructure the tables for the DW for analytics
  • Write unit tests to ensure the data pipeline behaves properly and reliably.