-
Notifications
You must be signed in to change notification settings - Fork 0
alfredodimassimo/movie_predictions
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Alfredo Di Massimo BrainStation Data Science Diploma Program Overview: This is my capstone submission for the january 2022 BrainStation Data Science cohort. The focus of this project was to utilize machine learning and Natural Language Processing to predict movie review sentiment. I was particularly interested in understanding what factors influenced a reviewer's opinion after watching a movie and applying the findings to provide actionable insights to producers and improve marketing tactics. The code for this project is provided in 5 jupyter notebooks found in the "Notebooks" folder: 1. Data Loading and Merging 2. Cleaning and EDA 3. Modeling 4. Findings and Interpretation 5. Appendix (NOTE: It is not required to load this notebook to run the others, as it contains the code related to importing the raw data (minimum 4 hour runtime)) A requirements.txt file is also included, outlining the modules required for running the notebooks. The data used for this project was derived from the following sources: - IMDb Movies Dataset: https://www.imdb.com/interfaces/ - IMDb Movie Reviews: https://paperswithcode.com/dataset/imdb-movie-reviews The data has been stored in the following Google Drive (https://drive.google.com/drive/folders/1ZDrvSZHhjUW53fQ2dlKhZoanF8ULMzVM?usp=sharing). It contains a "Reviews" folder with the original IMDb Movie Reviews dataset as well as the necessary .csv and .gz files required to run the notebooks. NOTE: The "Reviews" folder contains ~85MB of data for the reviews as well as the README file for the original study conducted. These reviews are saved in the .csv files in the main directory and so is not required to run the notebooks. Lastly, the "models_functions" folder contains the fitted models and list of custom stop words required to run the notebooks. It is stored in the same Google Drive above. For questions, do not hesitate to contact me at [email protected]
About
My BrainStation Data Science 2022 Capstone Project
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published