This project is a web scraper that extracts data from a Wikipedia list and converts it into a CSV file. The scraper is designed to handle tables from Wikipedia pages and save the results into a CSV format for further analysis or use.
First, clone the repository to your local machine:
git clone [email protected]:UzitheI/python_webscraper.git
cd python_webscraper
Create a virtual environment to isolate the project dependencies:
python3 -m venv .venv
Activate the virtual environment:
- On Windows:
.venv\Scripts\activate
- On macOS and Linux:
source .venv/bin/activate
Install the required Python packages using pip
:
pip install -r requirements.txt
You have two options to run the scraper:
If you prefer to work with Jupyter Notebook, open the scraping.ipynb
file:
jupyter notebook scraping.ipynb
This will open the notebook in your browser, where you can execute the cells to run the scraper.
If you prefer to run the scraper directly as a Python script, use the following command:
python main.py
After running the script, you'll be prompted to insert the URL of the Wikipedia page. The scraper will then generate the required CSV file for you.
The output of the scraper will be saved as a CSV file in the project directory. The name of the file will be filename.csv
, which you can find in the folder.
This project provides a simple yet powerful tool to scrape data from Wikipedia lists. You can customize it further to handle different types of tables or other sources. Feel free to contribute or modify the code to suit your needs!