Another spider, powered by gevent,requests,pyquery
- The concurrency foundation on gevent
- The spider strategy highly configurable:
- max depth
- the count of urls you want fetch
- the max concurrency of http request,avoid dos
- the http request headers and cookie can be set
- just crawl same host url
- just crawl same domain url
- python 2.7
- gevent
- requests
- pyquery
spider = Spider()
spider.setRootUrl("http://www.sina.com.cn")
spider.run()