Perform sentiment analysis on Twitter data using the Naive Bayes classifier and NLTK.
This project uses machine learning techniques to classify tweets as positive or negative. It leverages the NLTK library for natural language processing and the Naive Bayes classifier for sentiment analysis.
- Data Collection: Utilizes the
twitter_samples
dataset from NLTK, which includes positive and negative tweets. - Data Preprocessing: Tokenizes tweets and removes usernames.
- Feature Extraction: Incorporates bigram collocations to enhance feature representation.
- Model Training: Trains a Naive Bayes classifier using the processed tweets.
- Model Evaluation: Tests the classifier and calculates its accuracy.
- Python: The programming language used.
- NLTK: Natural Language Toolkit for text processing.
- Naive Bayes Classifier: A probabilistic classifier for sentiment analysis.
Install Dependencies: `bash pip install nltk
Download NLTK Data: Open a Python interpreter and run:
python Copy code import nltk nltk.download('twitter_samples') nltk.download('punkt') Run the Script: Save the provided script in a file named twitter_sentiment_analysis.py and execute it:
bash Copy code python twitter_sentiment_analysis.py
Accuracy The classifier's accuracy is printed at the end of the script execution. *99.15
License © 2024 Rajeev Sharma ([email protected])