Skip to content

This project is a time series analysis and prediction of the spread of COVID-19 and the financial impact it has had.

Notifications You must be signed in to change notification settings

NerdParker/COVID-19-Predictions-and-Financial-Crisis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COVID-19-Predictions-and-Financial-Crisis

This project is a time series analysis and prediction of the spread of COVID-19 and the financial impact it has had. For a video presentation: https://youtu.be/C3qMzDXyIaU

COVID-19 is a corona virus causing respiratory illness believed to have originated in Wuhan China and brought to the World Health Organizations attention on December 31st of 2019. Person to person spread is occurring at a rapid rate and has since been slowed somewhat by quarantine. This quarantine has had a dramatic impact on the financial market as many people and industries are unable to function in a work at home world riddled with travel bans. This project will tackle an analysis of the virus symptoms, doctor sentiment, who is at risk, geographic spread, total cases and their outcomes and the financial impact it has had on large tech companies.

Summary:

The below interactive plot shows the spread of the virus over the past few months. (last updated 4/25/2020)

COVID-19 Confirmed Cases Spread

Future deaths, recoveries and confirmed cases were forecasted based on current trends and modeled. Below is a graph of the forecasted confirmed cases. The initial exponential growth has subsided to linear growth.

alt text

Stock prices have taken a dive as we see here with large tech companies but are forecasted to return to their previous positions. Some industries may never recover.

alt text

In conclusion the main symptoms are fever, cough and sore throat. The initial exponential spread is down to linear growth and expected to continue in the short term. U.S. and China have reported the most cases. Individual regions and provinces can be tracked to help determine when quarantine might be lifted and where additional medical support is most needed. Large tech companies have taken a big financial hit but are expected to make a decent recovery by the end of the year.

For a more indepth look at the project and findings, see below:

Contents:

  1. Data Exploration and Cleaning
  2. Case Review Summary Sentiment Analysis
  3. Time Series Data Exploration and Forecasting
  4. Financial Data Analysis and Forecasting
  5. Conclusions
  6. Future Work

Data Exploration and Cleaning:

All the data files can be found in the "Data" folder. The initial data cleaning and exploration can be found in Covid-19 People & Symptom Analysis Practicum.ipynbin the Jupyter Notebooks folder as well as the respective python files in the Python files folder.

Dataset: (https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset)

  1. This dataset contains time series data on the number of confirmed, deaths and recovered COVID-19 cases. After some general cleaning on the "line list" data looks like this:

alt text

  1. Looking for correlations in the numerical variables I ran a pair plot:

alt text

We can see that the virus infects all age ranges but almost all deaths are older individuals:

alt text

  1. Further cleaning of the "line list" data specifically looking at the patient summary we have:

alt text

  1. Next, I cleaned the summaries by removing punctuation and digits. I left the pronouns, lemmatized the rest to remove the stop words and then joined them back together. A wordcloud of the results is below:

alt text

The wordcloud reveals that the top words are confirm, covid, patient, new, symptom, male, onset, female, wuhan, fever, pneumonia etc. We can see this further with a bar chart. I used matplotlib and seaborn to visualize the top occurring words post cleaning:

alt text

I visualized the top symptoms:

alt text

Case Review Summary Sentiment Analysis:

The case review summary sentiment analysis work can be found in Covid-19 People & Symptom Analysis Practicum.ipynb in the Jupyter Notebooks folder as well as the respective python files in the Python files folder.

  1. In addition to previous text cleaning bi-gram and tri-gram models were made and lemmatized for the text cleaning. TextBlob was used to determine the sentiment of the summaries and the results plotted below:

alt text

The sentiment of the summaries is actually slightly positive overall.

Time Series Data Exploration and Forecasting

The time series data and forecasting work can be found in Covid-19 Time Series and Prediction Practicum.ipynb in the Jupyter Notebooks folder as well as the respective python files in the Python files folder.

  1. This dataset contains time series data on the number of confirmed, deaths and recovered COVID-19 cases. After some general cleaning on the "covid_19_data" file our output looks like this:

alt text

A good deal of further cleaning went into the "time_series_covid_19_confirmed", "..deaths" and "..recovered" data files as well which hold critical time series data as they track the outbreak results over-time.

  1. A plot of the reported cases by country is below:

alt text

This plot shows that China, the United States, Australia, Canada and France have the most reported cases as of the last iteration.

  1. I created an interactive plotly stacked barchart that shows the total reported cases over time and the number who have recovered or died.

alt text

(insert method of linking to interactive plot outside of the jupyter notebook?) This plot shows that the number of cases is increasing greatly and is up to over 350k but many patients recover.

  1. An interactive geo scatter plot using plotly depicts the top reported cases overlaid on their countries and sized by the number of total cases:

alt text

This plot shows that China has the most reported cases at 81k as of the last iteration.

  1. A similar plot in Tableau.

alt text

  1. An interactive geo scatter plot using plotly depicts the top reported cases overlaid on their countries and sized by the number of total cases with the addition of showing the outbreak spread over time:

alt text

  1. Forecasts were made using FbProphet to model the virus's upcoming outlook, the confirmed cases forecast is below:

alt text

The deaths forecast:

alt text

The recovered cases forecast:

alt text

  1. The final step was to forecast the next five days of new cases, deaths and recoveries for each country, region, state and province. Again, using Fbprophet and a couple of loops we are able to model each location and combine them back into one forecast. A portion of the results for can be seen below showing the forecast for New South Wales in Australia.

alt text

The Mean Absolute Error for this prediction was 70.9.

Financial Data Analysis and Forecasting

The financial data analysis and forecasting work can be found in Yahoo Finance API Data Practicum.ipynb in the Jupyter Notebooks folder as well as the respective python files in the Python files folder.

  1. This notebook accessed the Yahoo Finance Data API which contains time series data on company stocks. The Google data for the last five years looks like this:

alt text

A plot of this five year data:

alt text

We can see an upward trend over the past five years in Google stock and then a significant dip over the last couple months likely due to COVID-19.

Google stock five-year returns:

alt text

Google stock returns have spiked negatively the most in the past five years down to -10% during the COVID-19 outbreak and appears quite volatile.

  1. To examine if this is a trend across large technology companies other major companies stock information is brought in from Yahoo Finance API:

alt text

A correlation plot shows similarity between the companies:

alt text

alt text

A scatter of the five years of Google and Microsoft stocks shows slightly above average returns with more high return days than low.

alt text

  1. The same data but only focusing on 2020. Google stocks this year:

alt text

The stock has greatly declined since February.

Google returns this year:

alt text

We can see that most days have a negative return since the end of February.

Comparing Google and Microsoft only during the past 6 months we see many negative returns and some that are quite high including -15%.

alt text

Correlations between the companies are even higher now suggesting an across the board decline:

alt text

  1. Forecasting major tech companies financial capability:

Past five years expected returns:

alt text

GE and IBM have negative expected returns while the other major tech companies are positive with Microsoft having the highest expected return. (Possibly due to being awarded the JEDI contract.)

Just 2020 data expected returns:

alt text

We can see here that only Microsoft has positive expected returns and low risk while each of the other major tech companies have expected losses.

Finally, we have the tech companies forecasted stock prices:

alt text

The stocks are expected to recover. I did not run another forecast on just this years data as I don't believe it is enough to forecast on. I also suspect it would not be trustworthy as this would likely not suggest recovery being possible but logically if COVID-19 eventually allows for business to resume as usual the market should begin an upward trend again.

Conclusions

The main symptoms are fever, cough and sore throat. The initial exponential spread is down to linear growth and expected to continue in the short term. U.S. and China have reported the most cases. Individual regions and provinces can be tracked to help determine when quarantine might be lifted and where additional medical support is most needed. Large tech companies have taken a big financial hit but are expected to make a decent recovery by the end of the year.

Future Work

Additional visualizations and dashboards. Financial data for other industries. Other Forecasting methods.

About

This project is a time series analysis and prediction of the spread of COVID-19 and the financial impact it has had.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published