Skip to content

Latest commit

 

History

History
61 lines (43 loc) · 1.24 KB

README.md

File metadata and controls

61 lines (43 loc) · 1.24 KB

Apache Nutch wtih MongoDB as backend

Based on official Apache Nutch release (current Nutch version is 2.3.1).

Supported tags and respective Dockerfile links

Used technologies

  • Nutch 2.3.1
  • OpenJDK 8
  • Gora 0.6.1
  • Gora MongoDB 0.6.1

Start Nutch in development mode

Use docker-compose.yml file to run MongoDB and Apache Nutch

docker-compose up -d
docker-compose logs -f nutch

Start Nutch in production mode

  • Create youw own Dockerfile
FROM pure/nutch-mongo:alpine

ADD urls/ /urls/
ADD conf/ /nutch/conf/
docker build -t my-nutch .
  • Run your own Nutch with desired count of iterations:
docker run \
    -d
    -e ITERATIONS=5 \
    --name my-crawler \
    my-nutch
  • Check logs
docker logs -f my-nutch