Sudan Webscraper Project

Description

This project is developed to scrape data from news websites related to the Sudanese Civil War. It aims to collect and analyze data for academic, research, and informational purposes. The scraper targets specific websites and extracts relevant information, facilitating easier access and processing of data.

Notes

Contact me for .env.
When working locally, before initial run, ensure you are not pushing to db (we can test functionality with excel)

articles = get_articles()

# Operate on dataframe
df = pd.DataFrame(articles)
df['date'] = pd.to_datetime(df['date'])
df.sort_values(by='date')
df['date'] = df['date'].dt.tz_localize(None) # Remove timezone info for excel compatibility

excel_writer = pd.ExcelWriter('News_Articles/guardian_articles.xlsx')
df.to_excel('News_Articles/guardian_articles.xlsx')

Installation

Requirements

Python 3.11
requirements.txt

Setup

Clone the repository to your local machine:

git clone https://github.com/stccenter/sudan_web_scraper
cd sudan_web_scraper
conda env create -f environment.yml

You can then run the python program as you would any other.

Name		Name	Last commit message	Last commit date
Latest commit History 178 Commits
.vscode		.vscode
Crawlers		Crawlers
Performance Analysis		Performance Analysis
articles_collected		articles_collected
.gitignore		.gitignore
README.MD		README.MD
Sudan_Web_Scraping.code-workspace		Sudan_Web_Scraping.code-workspace
Violin_Plot.png		Violin_Plot.png
date_standardizer.py		date_standardizer.py
duplicate_handler.py		duplicate_handler.py
environment.yml		environment.yml
newplot.png		newplot.png
newplot2.png		newplot2.png
requirements.txt		requirements.txt
scraping_time_analysis.ipynb		scraping_time_analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sudan Webscraper Project

Description

Notes

Installation

Requirements

Setup

About

Releases

Packages

Contributors 2

Languages

stccenter/sudan_web_scraper

Folders and files

Latest commit

History

Repository files navigation

Sudan Webscraper Project

Description

Notes

Installation

Requirements

Setup

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages