This is a simple Python webspider, could collect and store all watched movies of a specified douban.com user into a csv file. The input should be the link of douban movie first page of a user.
这是一个简单的Python网络爬虫,可以采集指定豆瓣用户所有看过的电影并处存进一个csv文件。爬虫的输入是豆瓣用户电影首页地址。
Test input: $ Make test
In order to avoid IP banning, it takes about 40 minutes to finish the test.
测试输入: $ Make test
为了防止IP封禁,完成测试大概需要40分钟。
Totally Python code, used library: urllib2, bs4, time, re, csv, sys.
Python代码,涉及库:urllib2,bs4,time,re,csv,sys。
Code was wrotten in March/2015, Dublin Ireland.
代码于2015年3月,爱尔兰都柏林。