School Contact Scraper

Description

This Python application uses Selenium, running in a Dockerized environment, to scrape school websites for contact information. It iterates over a predefined list of places, accesses a school listing website for each, collects URLs for individual schools, and scrapes each school's website for contact details like email addresses.

Prerequisites

Before you begin, ensure you have met the following requirements:

Docker and Docker Compose are installed on your machine.
Basic knowledge of Docker and containerization.

Setup

To set up the School Contact Scraper, follow these steps:

Clone the repository to your local machine.
Navigate to the cloned directory.

Running the Application

To run the application, execute the following command in the terminal:

docker-compose up --build

This command builds the Docker images and starts the containers as defined in your docker-compose.yml file. Specifically, it sets up a Selenium Hub, a Chrome node, and your application in separate containers.

Files and Directories

The project structure includes the following files and directories:

docker-compose.yml: Defines the multi-container Docker applications.
Dockerfile: Instructions for building the application's Docker image.
requirements.txt: Lists the Python dependencies for the application.
scrape.py: The main Python script for scraping websites.
school_emails.csv: File where scraped email addresses are stored.
wait-for-it.sh: A script for controlling the order of service startup in Docker Compose.

Stopping the Application

To stop the application and remove the containers, networks, and volumes created by docker-compose up, run:

docker-compose down

Additional Notes

Ensure that the ports defined in docker-compose.yml are available on your machine.
The application's behavior can be modified by changing the list of places in scrape.py.

Contributing

For any contributions or suggestions, please open an issue or submit a pull request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

School Contact Scraper

Description

Prerequisites

Setup

Running the Application

Files and Directories

Stopping the Application

Additional Notes

Contributing

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
scrape.py		scrape.py
wait-for-it.sh		wait-for-it.sh

pescheck-bram/scrape-schools-contact-emails

Folders and files

Latest commit

History

Repository files navigation

School Contact Scraper

Description

Prerequisites

Setup

Running the Application

Files and Directories

Stopping the Application

Additional Notes

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages