this Repository contains Docker-compose for Airflow, Postgresql, pgAdmin, JupyterLab and MongoDB. also have Airflow dags, notebooks for Genearte COVID 19 Reports for UK and Move data from Postgres to MongoDB
- Install Python 3.6 or above
- Install Docker
- Install Docker Compose
Clone the Repository and navigate to docker-airflow-assigment-two directory
Then run the docker-compose file
docker-compose up -d
- The Dags located in ./notebooks/src directory
- The generated reports located in ./notebooks/output directory
- The Jupyter notebooks located in ./notebooks directory
- Airflow => localhsot:8080
- JupyterLab => localhsot:8888
- PGAdmin => localhsot:8888
- Using PGAdmin create
Covid_DB
database - Using Airflow Open
covid_data
DAG - Tragger the DAG manually
- The following output will be genareted as following: 1-
uk_scoring_report.png
,uk_scoring_report.csv
anduk_scoring_report_NotScaled.csv
- Using PGAdmin create
Faker_DB
database - Create Table with any data
- Tragger the DAG manually
- the Output will be a collection with the data in MongoDB
The description for each process in both workflows as the following
The following operation used to achieve the purpose
- Get_uk_data Load all the data from Johns Hopkins University Data from Github Repo and store it in Postgrss with clean process applied and filter the data
- report_data The result of proccess genearted and located on ./notebooks/output directory
The following operation used to achieve the purpose
- extract_load extract the data using pandas from and dump it to mongoDB