Web Scraper for Wikipedia List

This project is a web scraper that extracts data from a Wikipedia list and converts it into a CSV file. The scraper is designed to handle tables from Wikipedia pages and save the results into a CSV format for further analysis or use.

How to Use the Project

1. Clone the Repository

First, clone the repository to your local machine:

git clone git@github.com:UzitheI/python_webscraper.git
cd python_webscraper

2. Create a Virtual Environment

Create a virtual environment to isolate the project dependencies:

python3 -m venv .venv

Activate the virtual environment:

On Windows:
```
.venv\Scripts\activate
```
On macOS and Linux:
```
source .venv/bin/activate
```

3. Install the Requirements

Install the required Python packages using pip:

pip install -r requirements.txt

4. Running the Scraper

You have two options to run the scraper:

Option 1: Run with Jupyter Notebook

If you prefer to work with Jupyter Notebook, open the scraping.ipynb file:

jupyter notebook scraping.ipynb

This will open the notebook in your browser, where you can execute the cells to run the scraper.

Option 2: Run with Python Script

If you prefer to run the scraper directly as a Python script, use the following command:

python main.py

After running the script, you'll be prompted to insert the URL of the Wikipedia page. The scraper will then generate the required CSV file for you.

5. Output

The output of the scraper will be saved as a CSV file in the project directory. The name of the file will be filename.csv, which you can find in the folder.

Conclusion

This project provides a simple yet powerful tool to scrape data from Wikipedia lists. You can customize it further to handle different types of tables or other sources. Feel free to contribute or modify the code to suit your needs!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Web Scraper for Wikipedia List

How to Use the Project

1. Clone the Repository

2. Create a Virtual Environment

3. Install the Requirements

4. Running the Scraper

Option 1: Run with Jupyter Notebook

Option 2: Run with Python Script

5. Output

Conclusion

Files

README.md

Latest commit

History

README.md

File metadata and controls

Web Scraper for Wikipedia List

How to Use the Project

1. Clone the Repository

2. Create a Virtual Environment

3. Install the Requirements

4. Running the Scraper

Option 1: Run with Jupyter Notebook

Option 2: Run with Python Script

5. Output

Conclusion