Coronavirus Twitter Analysis: 2020

This project was for my Data Structures and Algorithms class at CMC (Claremont McKenna College). The objective of this project was to get comfortable using the MapReduce technique and shell script commands to process large amounts of data, with this particular dataset being billions of tweets sent in 2020.

I used four different python programs to analyze this data: map.py, reduce.py, alternative_reduce.py, and visualize.py. The map.py file looks through all files passed into it and then determines the country origin and language of the tweet. The reduce.py takes multiple files, those of which are the output files of map.py, and combines them all into one file. visualize.py then generates bar graphs from the results of the top ten keys in each input file, and saves them as png files.

Coronavirus Hashtags by Country of Origin

Coronavirus Hashtags by Language

코로나바이러스Hashtags by Language

코로나바이러스 Hashtags by Country

Alternative.reduce (Combination of Reduce and Visualize) For the alternative_reduce.py portion of the project, I combined the structures of reduce and visualize into one file which would take different words as keys to compare their usages in tweets from 2020. For this particular graph I compared the use of #covid19, #coronavirus, #PPE, and #nurse.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
src		src
.gitignore		.gitignore
README.md		README.md
coronavirus_covid19_PPE_nurse.png		coronavirus_covid19_PPE_nurse.png
coronavirus_lang.png		coronavirus_lang.png
mapreduce.png		mapreduce.png
nohup.out		nohup.out
reduced.country		reduced.country
reduced.lang		reduced.lang
run_maps.sh		run_maps.sh
코로나바이러스_country.png		코로나바이러스_country.png
코로나바이러스_lang.png		코로나바이러스_lang.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coronavirus Twitter Analysis: 2020

About

Releases

Packages

Languages

Mross2858/twitter_coronavirus

Folders and files

Latest commit

History

Repository files navigation

Coronavirus Twitter Analysis: 2020

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages