-
- Every day, thousands suffering from mental illnesses like anxiety and depression post on social media. For example, on reddit, there is a subreddit for depression (https://www.reddit.com/r/depression/) where people can vent and express their feelings. Searching for #depression on twitter gives many tweet results by depressed people. This project aims to build a twitter bot (or a bot for any text-based social media platform) that does two tasks:
- Identify if the tweet has signals of mental illnesses.
- Reply to the tweet with an inspiring, relevant quote (from a corpus of quotes) that can help the person get a wider and better perspective, thus, helping them get out of their mental rut.
- Every day, thousands suffering from mental illnesses like anxiety and depression post on social media. For example, on reddit, there is a subreddit for depression (https://www.reddit.com/r/depression/) where people can vent and express their feelings. Searching for #depression on twitter gives many tweet results by depressed people. This project aims to build a twitter bot (or a bot for any text-based social media platform) that does two tasks:
-
-
- For task 1, to identify whether the tweet's text has signals of mental illnesses or not, a machine learning binary classification model is used. Till now, the following models have been tried out:
- Naive Bayes with TFIDF text embedding
- SVM with TFIDF text embedding
- Future Work
- Explore other classification models
- Explore deep learning based text embeddings like word2vec, BERT, etc
- For task 1, to identify whether the tweet's text has signals of mental illnesses or not, a machine learning binary classification model is used. Till now, the following models have been tried out:
-
- For now, there is a corpus of generic uplifting quotes from which a random quote is chosen.
- Future Work
- Define relevance and build an algorithm to pick a relevant quote from the corpus.
-
-
-
- tweepy StreamListener is used to process a stream of incoming tweets
- The raw tweets are ingested into a kafka topic
- The "preprocess" consumer consumes the raw tweets, preprocesses them and puts them on another kafka topic.
- The preprocessed tweets are consumed by the "predict" consumer which classifies the tweet into either having signs of mental illness (1) or not (0). The positive tweets are put on another kafka topic. The negative tweets are discarded
- The positive tweets are finally loaded into MongoDB.
- Periodically (orchestrated by Airflow), the "send replies" process wakes up, reads "n" positive tweets from MongoDB, finds a relevant quote and then replies to the original tweet using the tweepy Twitter API.
-
- Why Kafka?
- The rate of incoming tweets will be higher that the rate at which the tweets are processed (preprocess, classify, etc). Therefore, a data pipeline is used.
- Low coupling between different processes of "preprocess", "predict", etc In future, if there is a need to test multiple models, it would be as easy as launching a new "predict" consumer.
- Scalability and Reliability
- Because of the low coupling, the consumers(processors) and the producers(from social media) can be scaled individually.
- If one kafka process goes down, the others don't, and parts of the the system will still run during the repair time.
- Why store to DB?
- Ideally, the data pipeline's last process could have been responsible of sending replies to the tweets, but the bot would get blocked by Twitter by posting at the (nearly) same rate as it is consuming.
- Why NOSQL (MongoDB)?
- The data model does not involve relations between different entities.
- Consistency and isolation between the classifier and the twitter bot is not a major concern.
- NoSQL databases are easier to scale.
- Why Airflow?
- To orchestrate scheduling the workflow for various tasks:
- Stream new tweets every 't' hours and store to DB
- Extract 'n' tweets and send replies every 't' minutes.
- perform analytics, such as wordcount, to build a wordcloud.
- To orchestrate scheduling the workflow for various tasks:
- Why Kafka?
-
-
Notifications
You must be signed in to change notification settings - Fork 1
makalaaneesh/hapPy
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
A social media bot that can detect depression and respond with an uplifting quote
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published