Skip to content

randyklein/emailscrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Email Scraper

This application crawls an entire website to find all email addresses listed on any page. The application will crawl all pages on the given domain, including subdomains. The crawler will not follow links to other domains.

The application will output the email address along with the URL the email address was found to a CSV file.

The app can be ran as a python application or a docker container. It is recommended to use a docker container so that the container can be paused or resumed if necessary.

Running

Python

  • git clone https://github.com/randyklein/emailscrape.git or git pull from project folder to update
  • cd emailscrape
  • pip install -r requirements.txt
  • python3 main.py

Docker

  • git clone https://github.com/randyklein/emailscrape.git or git pull from project folder to update
  • cd emailscrape
  • docker build -t emailscrape .
  • docker run --name emailscrape -d -v /path/to/folder:/app/export --log-opt max-size=10m --log-opt max-file=3 emailscrape

Docker Usage

  • Pausing: docker pause emailscrape
  • unpausing: docker unpause emailscrape
  • viewing logs: docker logs -f --tail 0 emailscrape

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages