Skip to content

giormala/imdb-movies

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

IMDB Movie Data Analysis

Overview

This repository contains an analysis of a movie dataset, where we explore various aspects of the movie industry, such as movie ratings, revenue, production details, and more. The analysis is performed using Python and relevant data analysis libraries such as Pandas, Matplotlib, and Seaborn.

Dataset

The dataset includes key features about movies such as:

  • Title: The name of the movie
  • Release Date: When the movie was released
  • Revenue: Total revenue generated by the movie
  • Runtime: Duration of the movie in minutes
  • Genres: The genres the movie falls into
  • Production Companies: The companies involved in producing the movie
  • Languages Spoken: The languages featured in the movie
  • Popularity: Popularity score
  • Vote Count and Average: Number of votes and average rating of the movie

Analysis

Key analyses performed in this notebook include:

  1. Data Cleaning: Handling missing values, data type conversions, and general preprocessing.
  2. Exploratory Data Analysis (EDA):
    • Distribution of movie release dates.
    • Revenue and runtime trends over time.
    • Popular genres and their correlation with revenue and ratings.
  3. Visualizations:
    • Box plots to show the distribution of revenue and runtime.
    • Scatter plots comparing various attributes such as revenue vs. runtime, and vote average vs. revenue.
    • Histograms and bar charts for categorical features like genres and production countries.
  4. Correlations:
    • Identifying correlations between different features such as revenue, vote count, and vote average.
    • Analysis of how runtime affects revenue and vote averages.

Prediction Model

Objective

As part of the analysis, a predictive model was built to forecast movie success. The goal was to predict movie revenue based on various features such as runtime, vote average, popularity, and production budget.

Model Building

The model was built using a supervised learning approach, with the following steps:

  1. Feature Selection: Selected relevant features like runtime, vote_average, popularity, budget, and genres.
  2. Data Preprocessing:
    • Handled missing values.
    • Encoded categorical variables (such as genres and production_companies) using techniques like one-hot encoding.
    • Scaled numerical features where necessary.
  3. Model Training:
    • Used a Linear Regression model to predict the revenue.
    • Evaluated the model's performance using metrics like Mean Squared Error (MSE) and R-squared (R²).

Libraries Used

  • Pandas: For data manipulation and analysis.
  • Matplotlib/Seaborn: For visualizing the data and generating plots.
  • NumPy: For numerical operations.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published