-
article_scraper.py - Assuming that urls from each news source are accumulated in seperate csv files (ex: who.csv), the article_scraper.py will scrape each article using the urls and store the article's text and title to adjacent columns. The result will be a dataframe containing column headers: ['date', 'url', 'title', 'text'] and will be written to a csv file (ex: who_scraped.csv)
-
article_scraper_translator.py - This script can be used to scrape and translate articles that are not in english.
-
twitter_scraper.py - This script can be used to scrape tweets, using tweet handles instead of urls.
-
all_source_text.csv - Once articles/tweets from all sources have been accumulated, they can be combined in a single document such as has been done in all_source_text.csv (this process has been manually done for this project).
-
source_bing.R - set the path to the all_source_text.csv file and each of the source_bing.R scripts will generate sentiment scores using the bing library for the articles derived from that news source for each given date.
-
source_nrc.R - this script would calculate sentiment scores for each article based on the nrc library.
-
the R scripts produce a plot for the sentiments per source per date. Sample images are included in Ebola pics.zip folder.
-
Notifications
You must be signed in to change notification settings - Fork 0
thepanacealab/Ebola
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published