IMDB Movie Data Analysis

Overview

This repository contains an analysis of a movie dataset, where we explore various aspects of the movie industry, such as movie ratings, revenue, production details, and more. The analysis is performed using Python and relevant data analysis libraries such as Pandas, Matplotlib, and Seaborn.

Dataset

The dataset includes key features about movies such as:

Title: The name of the movie
Release Date: When the movie was released
Revenue: Total revenue generated by the movie
Runtime: Duration of the movie in minutes
Genres: The genres the movie falls into
Production Companies: The companies involved in producing the movie
Languages Spoken: The languages featured in the movie
Popularity: Popularity score
Vote Count and Average: Number of votes and average rating of the movie

Analysis

Key analyses performed in this notebook include:

Data Cleaning: Handling missing values, data type conversions, and general preprocessing.
Exploratory Data Analysis (EDA):
- Distribution of movie release dates.
- Revenue and runtime trends over time.
- Popular genres and their correlation with revenue and ratings.
Visualizations:
- Box plots to show the distribution of revenue and runtime.
- Scatter plots comparing various attributes such as revenue vs. runtime, and vote average vs. revenue.
- Histograms and bar charts for categorical features like genres and production countries.
Correlations:
- Identifying correlations between different features such as revenue, vote count, and vote average.
- Analysis of how runtime affects revenue and vote averages.

Prediction Model

Objective

As part of the analysis, a predictive model was built to forecast movie success. The goal was to predict movie revenue based on various features such as runtime, vote average, popularity, and production budget.

Model Building

The model was built using a supervised learning approach, with the following steps:

Feature Selection: Selected relevant features like runtime, vote_average, popularity, budget, and genres.
Data Preprocessing:
- Handled missing values.
- Encoded categorical variables (such as genres and production_companies) using techniques like one-hot encoding.
- Scaled numerical features where necessary.
Model Training:
- Used a Linear Regression model to predict the revenue.
- Evaluated the model's performance using metrics like Mean Squared Error (MSE) and R-squared (R²).

Libraries Used

Pandas: For data manipulation and analysis.
Matplotlib/Seaborn: For visualizing the data and generating plots.
NumPy: For numerical operations.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
analysis_presentation_canva.pdf		analysis_presentation_canva.pdf
movies_analysis.ipynb		movies_analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IMDB Movie Data Analysis

Overview

Dataset

Analysis

Prediction Model

Objective

Model Building

Libraries Used

About

Releases

Packages

Languages

giormala/imdb-movies

Folders and files

Latest commit

History

Repository files navigation

IMDB Movie Data Analysis

Overview

Dataset

Analysis

Prediction Model

Objective

Model Building

Libraries Used

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages