This repository contains a tutorial to build a Twitter Bot that classifies hate and offensive tweets using NeuralSpace's Language Understanding (NLU) API.
News Article References:
- Indian Express - Facebook didn’t flag India hate content because it lacked tools: Whistleblower
- Time of India - There is a pandemic of fake news and hate on social media. You can help fight it
- Aljazeera - How social media regulations are silencing dissent in Africa
- Arab News - Social media platforms doing little to combat online hate speech in the Arab world: Experts
- Hindustan Times - It is time to regulate hate speech on social media
- The New Stateman - Does how you talk to your AI assistant matter?
- Economic Times - Twitter loses online hate speech court battle in France
With a rise in the number of posts made on social media, an increase in the amount of toxic content on the web is witnessed. Measures to detect such instances of toxicity is of paramount importance in today’s world with regards to keeping the web a safe and healthy environment for all. Detecting multilingual hateful and offensive content in typical posts and comments found on the web is the first step towards building a system which can flag items with possible adverse effects and take steps necessary to handle such behavior.
Can NeuralSpace Apps be used as a stepping stone to solve this alarming problem to flag multilingual hatespeech?
Let us get started!
1. Train your model using NeuralSpace Language Understanding.
Let us first train a hatespeech model using NeuralSpace Language Understanding (Natural Language Understaning) App. To do this, follow our our Colab repository to easily build and train your model.
Wohoo! Once you have your model trained, let us build the Twitter Bot.
We will need to install some packages. Let us make a conda environment. You can use Python >=3.6.
conda create --name neuralspace-nlu-bot python=3.6
conda activate neuralspace-nlu-bot
pip install tweepy==3.10.0
pip install pyyaml
As you see, we will use tweepy
, a package that provides a very convenient way to use the Twitter API. Here is the documentation.
The Twitter API requires that all requests use OAuth to authenticate. So you need to create the required authentication credentials to be able to use the API. These credentials are four text strings:
- CONSUMER_KEY
- CONSUMER_SECRET
- ACCESS_TOKEN
- ACCESS_TOKEN_SECRET
If you already have a Twitter user account, then follow these steps to create the key, token, and secrets. Otherwise, you have to sign up as a Twitter user before proceeding.
Go to the Twitter developer site to apply for a developer account. Here, you have to select the Twitter user responsible for this account. It should probably be you or your organization.
In this case, I chose to use my own account, @bhatia_mehar
.
Twitter then requests some information about how you plan to use the developer account. You have to specify the developer account name and whether you are planning to use it for personal purpose or for your organization.
Twitter grants authentication credentials to apps, not accounts. An app can be any tool or bot that uses the Twitter API. So you need to register your an app to be able to make API calls.
To register your app, go to your Twitter apps page and select the Create an app option.
- App name: a name to identify your application (such as
neuralspace-nlu-bot
) - Application description: the purpose of your application (such as
An example bot for a NeuralSpace NLU tutorial
) - Your or your application’s website URL: required, but can be your personal site’s URL since bots don’t need a URL to work
- Use of the app: how users will use your app (such as
This app is a bot that will automatically classify tweets that are hate and offensive in English, Hindi and Marathi languages
)
To create the authentication credentials, go to your Twitter apps page. Here’s what the Apps page looks like:
Here you’ll find the Dashboard button of your app. Clicking this button takes you to the next page, where you can generate the credentials.
By selecting the Keys and tokens tab, you can generate and copy the key, token, and secrets to use them in your code:
After generating the credentials, save them in the config.yaml
file in src
folder to later use them in your code.
Let us move to the next steps
To run the twitter bot, there are some other credentials that you must save. For NeuralSpace Language Understanding and Language Detection Authentication, you need the following.
- MODEL_ID
- ACCESS_TOKEN
The MODEL_ID
is extracted from the trained HateSpeech model from our Colab Notebook. The ACCESS_TOKEN
can be extracted in two ways.
After you login to neuralspace from the CLI using your emailID and password, you will find a link at the bottom where your credentials are saved. In the image below, you can see that the credentials are saved to /root/.neuralspace/auth.json
. Open that and copy paste to the config.yaml
file in src
folder under neuralspace-lang-detection-auth
and neuralspace-nlu-auth
.
After you login to the Platform, you will find the ACCESS_TOKEN at the top right of the screen beside Shortcuts
and API_KEY
. Copy the ACCESS_TOKEN and paste to the config.yaml
file in src
folder under neuralspace-lang-detection-auth
and neuralspace-nlu-auth
.
Congratulations you have now successfully authenticated your twitter bot and linked NeuralSpace Language Understanding App to the bot!
Currently, using this Tutorial you can use the following features:
-
Classify the recent n tweets for a specific Twitter user-handle as hate/offensive tweets or not
To work with this feature, save the Twitter
USER_HANDLE
and the count ofRECENT_NUM_TWEETS
in theconfig.yaml
undertwitter-query
and setpass-userhandle
as True. -
Pass a URL of a tweet and classify the recent n comments as hate/offensive or not
To work with this feature, save the
TWITTER_URL
and count ofRECENT_NUM_COMMENTS
in theconfig.yaml
undertwitter-query
and setpass-tweet-url
as True. -
Set precision in Hate-Speech bot
As mentioned in the above Colab Notebook, for this Tutorial, we use the dataset from HASOC 2021 Compeition which contains samples in three languages namely, English, Hindi and Marathi. Since this training data is aligned towards tweets related to COVID-19 and related political agenda, our trained model might not be able to classify tweets correctly for other domains. For this reason, it is vital to have a precison-control on our demonstrated Twitter bot to get accurate results.
To do this, we extract a threshold for the confidence score using AUC-ROC Curve. In Machine Learning, it is one of the most important evaluation metrics for checking any classification model’s performance.
If you would like to set precision in your hate-speech bot and calculate the threshold, set
set-precision
undercontrol-precision
as True in config file and run filesrc/roc/ind_optimal_threshold.py
. Then, copy paste the threshold value underthreshold-roc
. To know more about AUC-ROC Curve in Machine Learning, view the README undersrc/roc/
-
Save a report with the results
If you would like to save a report with the results from the bot, select
download-report
as True in the config.file and also pass the name you would like to name the file.
Wohoooo! You are now set. To run the bot end-to-end, run the following command:
cd src/
python neuralspace_nlu_bot.py
Wasn't that quick and easy to do! Would you like to share our tutorial this with your network on social media?
- Share on LinkedIn: Click here
- Share on Twitter: Click here
👉 Check out our Documentation for all the Apps and features of the NeuralSpace Platform.
👉 Join the NeuralSpace Slack Community to receive updates and discuss about NLP for low-resource languages with fellow developers.
👉 Read more about the NeuralSpace Platform on neuralspace.ai.