Skip to content
This repository has been archived by the owner on Nov 27, 2017. It is now read-only.
/ ants-go Public archive

open source, distributed, restful crawler engine in golang

License

Notifications You must be signed in to change notification settings

wcong/ants-go

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ants-go

open source, restful, distributed crawler engine

gitter

Join the chat at https://gitter.im/wcong/ants-go

comming up

  • Persistence
  • Dynamic Master

design of ants-go

ants

I wrote a crawler engine named ants in python base on scrapy. But sometimes, dynamic language is chaos. So I start to write it in a compile language.

scrapy

I design the crawler framework by imitating scrapy. such as downloader,scraper,and the way user write customize spider, but in a compile way

elasticsearch

I design my distributed architecture by imitating elasticsearch. it spire me to do a engine for distributed crawler

requirement

go get github.com/PuerkitoBio/goquery
go get github.com/go-sql-driver/mysql

install

go get github.com/wcong/ants-go
go install github.com/wcong/ants-go

run

cd bin
./ants-go

check cluster status

curl 'http://localhost:8200/cluster'

get all spiders

curl 'http://localhost:8200/spiders'

start a spider

curl 'http://localhost:8200/crawl?spider=spiderName'

cluster in one computer

to test cluster in one computer,you can run it from different port in different terminal

one node,use the default port tcp 8300 http 8200

cd bin
./ants-go

the other node set tcp port and http port

cd bin
./ants-go -tcp 9300 -http 9200

flags

there are some flags you can set,check out the help message

./ants-go -h
./ants-go -help

Customize spider

  1. go to spiders
  2. write your spiders follow the example deap_loop_spider.go or go to the spider page
  3. add you spider to spiderMap,follow the example in LoadAllSpiders in load_all_spider.go
  4. install again

About

open source, distributed, restful crawler engine in golang

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages