Data Mining of tweets withe MongoDB and Twitter API
We want to be able to extract tweets based on one or several specific hastags from twitter to then analyse them.
To achieve this goal we will use this jupyter notebook as a template and go through these steps:
- Install MongoDB
- Get your personal credentals from the Twitter API
- Select some hastags related to a current event
- Use the Stream API to collect tweets to create our dataset
- Analyse the collected tweets
- Jupyter Notebook
- pymongo (
pip install pymongo
) - tweepy (
pip install tweepy
) - With Twitter API, create a twitter application and get your acess tokens / credentials (tuto: Authenticate a Python Application with Twitter using Tweepy)
-
The file
.keys.json
should look like this but with your tokens:[{ "consumer_key": "xxxxxxxxx", "consumer_secret": "xxxxxxxxx", "access_token": "xxxxxxxxx", "access_token_secret": "xxxxxxxxx" }]
- Mining the Social Web, 2nd Edition by Matthew A. Russell (really cool and detailed book for mining Twitter, Facebook, Linkedin, Google+, web pages, etc!)
- twitter doc the API limits permit you to do the requests inside the boundaries of the API
- composition of a twit object