Behance-spider

Crawl images from Behance.net, field of textile design as example
Retrieve project URLs and save as xls

Pre-requirements

Install ButterSoup4 and selenium
pip install BeautifulSoup4
pip install selenium
Install support packages of regular expression, excel and socket connection
pip install re
pip install xlwt
pip install socket
Install browser webdriver
Download and install from browser support page

Run RetrieveProject.py
This script will grasp project urls from Behance.net, and save in file ProjectURL.xls
A pre-generated ProjectURL.xls is provided.
Run RetrieveImages.py
This script will download images of each project in ProjectURL.xls, and save in fold 'pic1' under the root
Downloading process and infomation will be printed.
If fail to download a image from the url, 0 will be writen at the corresponding row in ProjectURL.xls. Else, 1 will be written.
Run TransformImages.py
This script will convert different images to JPEG file with RGB colorspace.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
README.md		README.md
projectURL.xls		projectURL.xls
retrieveImages.py		retrieveImages.py
retrieveProject.py		retrieveProject.py
transformImages.py		transformImages.py