Prepared and presented by: Leah Pope
Presentation: here
Blog: On Medium
The goal of this project is to build a Natural Language Processing model to analyze sentiment about Apple and Google products. I'll classify a Tweet as negative, positive or neutral based on its content.
The Stakeholders for my project are marketing professionals in either company who are interested in learning consumer sentiment. It appears that these tweets were gathered during a session of the South by Southwest film, culture, music, and technology conference. This consumer sentiment would be of interest to the conference marketing professionals and vendor organizers.
The dataset for the project comes from CrowdFlower via data.world. Human raters rated the sentiment in over 9,000 Tweets as positive, negative, or neither.
- RandomForest with RandomOverSampling: Weighted F1 Score of 0.87
None of the binary classifiers did a good job with classifying Negative tweets, even with RandomOverSampling. RandomForest and RandomForest with RandomOverSampling had the highest weighted F1 scores of all models I trained. I'm calling RandomForest with RandomOverSampling the winner as it has a slightly better True Positive rate for correctly identifying Negative Tweets (0.37 vs 0.3). This is still crummy.
Here's the breakdown of all Binary Classifier Models and scores:
- RandomForest - 0.87
- RandomForest ROS - 0.87
- Multimonial NB - 0.82
- Multimonial NB ROS - 0.85
- RandomForest with RandomOverSampling: Weighted F1 Score of 0.68
RandomForest (and RandomForest with RandomOverSampling) had the highest weighted F1 scores of all models I trained. I'm calling RandomForest with RandomOverSampling the winner as it has a slightly better True Positive rate for correctly identifying Negative Tweets (0.29 vs 0.25). I imagine Negative tweets are most interesting to stakeholders. It also has a better True Positive rate for correctly identifying Positive Tweets (0.56 vs 0.48). These numbers are still pretty poor.
Here's the breakdown of all Multiclass Classifier Models and scores:
- RandomForest - 0.68
- RandomForest ROS - 0.68
- Multimonial NB - 0.60
- Multimonial NB ROS - 0.62
After analyzing the most common terms/bi-grams in Positive and Negative tweets, I can make the following Stakeholder Recommendations:
From the Negative Tweets, we see “iPad design, design headaches, and iphone battery” as common terms.
- Recommend to check iPhone battery performance
- Revaluate iPad design in light of this Negative sentiment
From the Positive Tweets, we see “Apple store, opening temporary, and Google party” as common terms.
- Recommend to repeat those two well received events
Futher analysis into the following areas could yield additional insights.
- Check if punctuation count could be a good feature for Tweet classification.
- Use spacy to replace NLTK
- Try using SMOTE to address class imbalance and see if it results in similar increases in overfitting as RandomOverSampling
- Hyper-parameter tuning for the Random Forest classifiers.
- Better understanding of using LIME to explain model behaviour.
- Try Transfer Learning using the GloVE pre-trained word embeddings for Twitter
- Get more Tweets for the corpus!
- Research for pre-labled Tweets
- Perform Sentiment Analysis using VADER and Text Blob tools and compare tool results to the human-annotated Tweets in this corpus. If the combined tool sentiment is similar enough to human labled sentiment, I could get more Google/Apple product Tweets and label them using tools to create a larger corpus. :)
--notebooks
----data_cleaning_and_eda.ipynb
----modeling.ipynb (binary classifiers are here)
----modeling2.ipynb (multiclasss classifiers are here)
--data
----cleaned_tweets_all.csv
----cleaned_tweets_positive.csv
----cleaned_tweets_negative.csv
----cleaned_tweets_neutral.csv
----crowdflower-brands-and-product-emotions/ (dir for data downloaded from challenge website)
--images (dir for images)