Skip to content

Simultaneous Time Series Forecasting on the global COVID-19 Daily Vaccinations

License

Notifications You must be signed in to change notification settings

drkostas/covid19-vaccinations-predict

Repository files navigation

Simultaneous Time Series Forecasting on the World's COVID-19 Daily Vaccinations

GitHub license

Table of Contents

About

Dataset: COVID-19 World Vaccination Progress
This is my project for the Data Mining Course (COSC-526). The main code is in this Jupyter Notebook.

Code Locations

  • The dataset is in the datasets/covid-world-vaccinations-progress directory
  • The metadata dataset is in the datasets/countries-of-the-world directory
  • The jupyter notebook used is the project.ipynb
  • Some custom packages used in the notebook are located in the data_mining directory:
    • Project Utils:
      • NullsFixer: for inferring the nulls in the COVID-19 vaccination dataset
      • Preprocess: the preprocessing code of the dataset before training
      • BuildModel: contains all the functions related to the building of the TF model
      • Visualizer: the implementations of all the visualizations
    • Configuration: it handles the yml configuration
    • ColorizedLogger: code for formatted logging that saves output in log files
    • timeit: ContextManager+Decorator for timing functions and code blocks
  • The project was compiled using my Template Cookiecutter project: https://github.com/drkostas/starter

Document Locations

The extended abstract and the poster are both located in the Documents folder.

Information About The Dataset

The COVID-19 Vaccination Progress Dataset contains information about the daily and total vaccinations of 193 different countries over 135 different dates. The data are being collected almost daily and of writing this (4/29), the dataset has 14230 rows and 15 different features.

The features of the dataset are the following:

  • Country: this is the country for which the vaccination information is provided
  • Country ISO Code: ISO code for the country
  • Date: date for the data entry; for some dates we have only the daily vaccinations, for others, only the (cumulative) total
  • Total number of vaccinations: this is the absolute number of total immunizations in the country
  • Total number of people vaccinated: a person, depending on the immunization scheme, will receive one or more (typically 2) vaccines; at a certain moment, the number of vaccination might be larger than the number of people
  • Total number of people fully vaccinated: this is the number of people that received the entire set of immunization according to the immunization scheme (typically 2); at a certain moment in time, there might be a certain number of people that received one vaccine and another number (smaller) of people that received all vaccines in the scheme
  • Daily vaccinations (raw): for a certain data entry, the number of vaccination for that date/country
  • Daily vaccinations: for a certain data entry, the number of vaccination for that date/country
  • Total vaccinations per hundred: ratio (in percent) between vaccination number and total population up to the date in the country
  • Total number of people vaccinated per hundred: ratio (in percent) between population immunized and total population up to the date in the country
  • Total number of people fully vaccinated per hundred: ratio (in percent) between population fully immunized and total population up to the date in the country
  • Number of vaccinations per day: number of daily vaccination for that day and country
  • Daily vaccinations per million: ratio (in ppm) between vaccination number and total population for the current date in the country
  • Vaccines used in the country: total number of vaccines used in the country (up to date)
  • Source name: source of the information (national authority, international organization, local organization etc.)
  • Source website: website of the source of information

For recalculating the per hundred people values we used another dataset that contains some metadata about the countries of the world, including their population.
Metadata Dataset: DataBank - World Development Indicators

Getting Started

These instructions will get you a copy of the project up and running on your machine.

Prerequisites

You need to have a machine with Python >= 3.6 and any Bash based shell (e.g. zsh) installed.

$ python3.6 -V
Python 3.6.13

$ echo $SHELL
/usr/bin/zsh

Setting Up

All the installation steps are being handled by the Makefile. The server=local flag basically specifies that you want to use conda instead of venv, and it can be changed easily in the lines #25-28. local is also the default flag, so you can omit it.

$ make install server=local

To update the COVID-19 vaccination dataset with the latest information, run:

$ make download_dataset server=local

Running the code

In order to run the code, you will only need to modify the yml file if you need to, and open a jupyter server.

Modifying the Configuration

There is an already configured yml file under confs/covid.yml with the following structure:

tag: project
covid-progress:
  - properties:
      data_path: datasets/covid-world-vaccination-progress/country_vaccinations.csv
      data_extra_path: datasets/world-bank/data.csv
      log_path: logs/covid_progress.log
    type: csv

Running Jupyter

After loading the cond environment with the command conda activate data_mining, run jupyter notebook and open the project.ipynb file.

TODO

Read the TODO to see the current task list.

Built With

  • Jupyter - An interactive computing framework
  • Tensorflow - A deep learning framework

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments