Skip to content

Mross2858/twitter_coronavirus

 
 

Repository files navigation

Coronavirus Twitter Analysis: 2020

This project was for my Data Structures and Algorithms class at CMC (Claremont McKenna College). The objective of this project was to get comfortable using the MapReduce technique and shell script commands to process large amounts of data, with this particular dataset being billions of tweets sent in 2020.

I used four different python programs to analyze this data: map.py, reduce.py, alternative_reduce.py, and visualize.py. The map.py file looks through all files passed into it and then determines the country origin and language of the tweet. The reduce.py takes multiple files, those of which are the output files of map.py, and combines them all into one file. visualize.py then generates bar graphs from the results of the top ten keys in each input file, and saves them as png files.

Coronavirus Hashtags by Country of Origin

Coronavirus Hashtags by Language

코로나바이러스Hashtags by Language

코로나바이러스 Hashtags by Country

Alternative.reduce (Combination of Reduce and Visualize) For the alternative_reduce.py portion of the project, I combined the structures of reduce and visualize into one file which would take different words as keys to compare their usages in tweets from 2020. For this particular graph I compared the use of #covid19, #coronavirus, #PPE, and #nurse.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.9%
  • Shell 2.1%