- python3
- beautiful soup 4 (by pip)
- an archieve directory containing the archieve_general.py and template
- define your archieve classes ("photo, dirty words, ..." etc.) by edit the
atype_list
in archieve_general.py - for each class, creat new directories "class_name/" and "class_name/imgs"
- paste the target link to the func
crawling
at the end of archieve_general.py - select the correct index of desired class
- mark the img flag to be 1 if want collect pictures
- mark the link flag to be 1 if want collect links to other websites in the target article