Sentiment Analysis on Hotel Reviews

Project Overview

This project focuses on performing sentiment analysis on hotel reviews. The goal is to collect, clean, and analyze review data to develop a machine learning model that can predict the sentiment (positive, negative, or neutral) of the reviews. The project involves four key stages: data collection, data cleaning, building a sentiment analysis machine learning model and visualizing the dataset in a map

Requirements

To run this project, you will need the following libraries installed:

Python 3.x
Pandas
NumPy
Pickle (Model loading)
Scikit-learn
NLTK (Natural Language Toolkit)
BeautifulSoup (Web scraping)
Wordcloud
Keras
SQLalchemy
Pymongo
Matplotlib (for data visualization)

You can install all dependencies by running:

bash
Copy code
pip install -r requirements.txt

Data Collection

Source:

Hotel reviews can be collected from various sources, such as:

Web scraping hotel booking platforms (e.g., TripAdvisor, Yelp.com). See 00 Data Collection for details
Public datasets from sources like Kaggle (Booking hotel reviews dataset).

Data Cleaning

Raw data may contain noise, missing values, or irrelevant information. The data_cleaning.py script handles data preprocessing, which includes:

Removing duplicates
Filling or removing missing values
Normalizing text (removing punctuation, converting to lowercase)
Tokenization, stopword removal, and lemmatization using NLTK or SpaCy
Encoding sentiment labels (e.g., Positive = 1, Negative = 0)

Model Training

For sentiment analysis, several machine learning models can be used, including:

Logistic Regression
Support Vector Machine (SVM)
K-nearest neighbor

The model_training.py script contains the process of training the model. It includes:

Vectorization: Use TF-IDF or Word2Vec for text vectorization.
Model Training: Train the machine learning model using labeled data.
Hyperparameter Tuning: Apply GridSearchCV or RandomSearch for optimizing hyperparameters. Example:

Evaluation

After training, the model is evaluated using various metrics:

Accuracy
Precision
Recall
F1-Score
Confusion Matrix

Contributors Hugo Villanueva ([email protected]) Feel free to open an issue or contribute to the project!

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
__pycache__		__pycache__
docs		docs
00 Data_collection.py		00 Data_collection.py
01 Data_prep.py		01 Data_prep.py
02 ML Model.py		02 ML Model.py
03 EDA.py		03 EDA.py
04 Visualization.py		04 Visualization.py
05 NN Model.py		05 NN Model.py
Hotel_Reviews.csv		Hotel_Reviews.csv
README.md		README.md
knn_model.sav		knn_model.sav
log_reg_model.sav		log_reg_model.sav
requirements.txt		requirements.txt
svm_model.sav		svm_model.sav

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis on Hotel Reviews

Project Overview

Table of Contents

Requirements

Data Collection

Source:

Data Cleaning

Model Training

Evaluation

About

Releases

Packages

Languages

hugvilduq/ML-Sentiment-Analysis-Hotel_Reviews

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis on Hotel Reviews

Project Overview

Table of Contents

Requirements

Data Collection

Source:

Data Cleaning

Model Training

Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages