The project comprises a real-time tweets data pipeline, a tweets sentiment analyzer module, and a Slack bot to post the tweets' sentiments. The project uses SentimentIntensityAnalyzer
from the VaderSentiment library. The analyzer gives positive, negative, and compound scores for small texts (such as tweets in this case).
The real-time data pipeline flow is as follows:
- Tweets are collected and stored in a database.
- The sentiment of the tweets is analyzed.
- The tweet sentiment is posted on a Slack channel using a Slack bot.
Docker container collects tweets in real-time based on a particular query using the streaming functionality in Tweepy API. This docker container stores the tweets in MongoDB. As soon as the program inserts new tweets in the database, it sends these tweets to another docker container. This docker container performs a sentimental analysis of the tweets and posts the results in a slack channel using the Slack bot. The tweets collection and storage are independent of the sentimental analysis and Slack posting.
- Streams tweets based on search queries using the Tweepy API.
- Stores tweets dictionary in a MongoDB database.
- Tweets' insertion in the tweets storage database is detected using MongoDB replica sets, which provide real-time sentiment analysis and posting of tweets in slack.
- Two docker containers keep tweets collection and storage independent of the tweet's sentiment analysis and slack posting job.
- Tweets' sentiments are analyzed using the Vader sentiment analysis.
- Sentiments of the tweets' are posted in a Slack channel using a Slack bot.
- Multiple Docker containers are defined and run using Docker compose tool.
I have tested the project on Ubuntu 20.04 machines (one with 2GB and another with 8GB RAM) with the default settings by folllowing the instructions given below. For running on other Ubuntu versions with docker installed (or Debian OSs), use more or less the below instructions.
-
git clone https://github.com/abhisrn1986/TwitterDataPipeline.git.
-
Install docker from instructions here.
-
To avoid prefacing the docker command with sudo, follow the instructions here.
-
Install docker-compose executing the command
sudo apt install docker-compose
in a shell. -
Sign up for a Twitter developer account with elevated access (maximum limit of 2 million tweet pulls per month) if you don't have one. It is mandatory to run this project (as Tweepy API Stream functionality is accessible with elevated access account). Here is a link on how to get started with Twitter API.
-
Create a slack bot to post tweets in a slack channel (see the section below for instructions)
-
Create a .env file in the root directory (TwitterDataPipeline) consisting of the credentials of Twitter and slack channel webhook in the following format:
CUSTOMER_KEY=key1 CUSTOMER_SECRET_KEY=key2 ACCESS_TOKEN=token1 ACCESS_TOKEN_SECRET=token2 BEARER_TOKEN=token3 SLACK_WEBHOOK=url
Note: Replace the text after = in each line with appropriate values (such as the customer keys, access tokens, slack webhook URL, and the query) and remember there shouldn't be any space after =.
-
Build all the docker containers by running
docker-compose build
in the terminal within the directory TwitterDataPipeline. -
To run the pipeline execute bash script run_pipeline.sh as
run_pipeline.sh -q "query1;query2"
.-q
option is mandatory to provide queries. Users can provide as many queries as possible with queries separated by semicolons. For instance, to stream all tweets related to China and Germany, run the commandrun_pipeline.sh -q "China;Germany"
and if the configuration is correct, this should post tweets with sentiment score in the slack channel used.
- Login and go to Your Apps.
- Choose to Create New App.
- Choose the option From scratch.
- Fill in a name and choose your slack workspace as Development Slack Workspace.
- Press Create App.
- Under "Add features and functionality" click on "Incoming Webhooks".
- Activate incoming webhooks by clicking on the switch.
- Click on Add new webhook to the workspace at the bottom of the page.
- Select a channel where you want to post messages and click on Allow.
- Scroll down and copy the Webhook URL into the code.
Note: More info can be found here