COVID-19 (Coronavirus) is an infectious virus that has spread all over the world. With over 30 Million cases worldwide and almost a million deaths, it has evolved into a modern pandemic. It has sent shockwaves throughout industries and businesses, sending economies into recession and elevating unemployment rates. The consequences of COVID-19 are still being documented, and there is abundant data available for analysts and data scientists. We have decided to delve deeper into this phenomenon and look for global and domestic trends. For our project, we will explore the trends and impacts from non-pharmaceutical measures such as lockdowns and trends within the tourism industry, immigration, among others.
For the COVID dataset, we will be obtaining datasets from European Union open data portal For the Overseas travelling dataset, we will be focusing on datasets obtained from Australian Bureau of Statistics.
For data cleaning, we will focus on indexing the datasets according to months. Any redundant, repeated, null or NaN values will be omitted from the dataset. Furthermore to make the data consistent, we will join the different dataset with a main COVID dataset. For example, if we are focusing on travelling trends in Australia, we will join the dataset with a COVID dataset targeting overseas arrival and departure. This will help us to obtain relevant information from the dataset, and minimize the time taking in the cleaning process.
Pareto chart for coronavirus case count, death count, and recovery count. Regression(Simple Linear, Random Forest, SVM): Predicting covid’s transmission factors Time Series Analysis: Observing the variation of covid-19 cases progression over time. We’re also expecting to use other techniques like Clustering and Classification for more in-depth analysis of our datasets
We have decided to plan two major milestones for our project.
Milestone 1 will be obtaining, cleaning and stitching together the relevant datasets. This will help provide a full picture of our target industries. We will have reached our milestone once we are sure that our data covers 2019-2020 data as a COVID-19 Impacted dataset and 2018-2019 as a control dataset. This is projected to be achieved by Week 8.
We’ll try to complete pareto charts, fit in models such as linear regression, Random Forest, SVM to see which one performs best, and do a Time Series Analysis on the datasets
Milestone 3 would be reached when we are able to complete the rest of our analysis, and upload a finalized version of the report by week 12 with some data visualizations.
Changed our focus from Toursim to Travelling trends as data was easier to track.