Crawl tweet data via Scrapy and map out a network of people via AntV
zh Chinese
-
crawl twitter user information
-
recent user tweets
-
crawl users' social connections (followers/fans)
- modify user_id = int('user_id') in spiders/weibo.py to determine which user is the core of the relationship
- modify relate_deep = 2 and deepth_fans = 2 to determine the depth of dispersion (deepth of 2 will include my followers/fans of followers/fans, the number of exponential growth)
- please rewrite proxy_handle and get_cookies to make sure middlewares can get the correct cookie and IP proxy
- run run.py
- open Draw/index.html, important parameters are: linkDistance: 50 (control edge length), endArrow: true (whether the edge has arrows), lineWidth: 0.65 (the thickness of the edge), can be changed according to your needs
- because the size of my cookie pool is too small, so in spiders/weibo.py 73 lines, 107 lines, 132 lines, 166 lines, 258 lines added time.sleep, running slow for no other reason
- the crawl filtered the users, filtered out the big V and the users with more than 10000 followers
- the accuracy rate of NLP is 89% due to the lack of training corpus, so the training corpus is attached to this project (the source of the corpus is unknown, download it from CDSN)
The main features are now complete, we are optimizing the readability of the images and other widgets, and correlating friendliness with social connections.
Thanks for your help and guidance
Because of the enhanced anti-crawl capability of Weibo, the mock login function in the project is no longer available and the followers/fans can only crawl to the first 20 pages, but this part of the project is no longer updated to focus on showing the relationships between users, so here is a note.