Data-Mining-Second-Project

Overview

This project focuses on data mining and analysis using the Netflix dataset. The project comprises several key phases, including data preprocessing, director comparison, frequent pattern extraction, description vectorization, clustering, and classification. The goal is to extract valuable insights and knowledge from the dataset.

Project Phases

1. Data Preprocessing

Load the Netflix dataset.
Clean the data using common data preprocessing techniques.

2. Director Comparison

Compare directors based on various attributes.
Visualize the comparison results to gain insights into director performance.

3. Frequent Pattern Extraction

Identify frequent patterns in the dataset, focusing on relationships between cast, director, and genres.

4. Description Vectorization

Utilize BERT (Bidirectional Encoder Representations from Transformers) to vectorize the description column.
Prepare the data for clustering.

5. Clustering

Perform clustering on the vectorized descriptions.
Visualize the clustering results in both 2D and 3D dimensions.

6. Classification

Label the descriptions and split the data into training and testing sets.
Perform classification on the description column, addressing the challenge of multi-genre categorization:
- Scenario 1: Consider the first genre as the primary one for classification.
- Scenario 2: Treat the combination of all genres as a single genre (due to practical constraints).

Dataset

To replicate the project, you can download the Netflix dataset from the provided files uploaded in this repository.

Prerequisites

Before running this project, ensure you have the following:

Python installed on your system.
Required Python libraries and dependencies installed (specified in the project's requirements file).

Installation

Clone this repository or download the source files.
Install the necessary Python packages using pip or your preferred package manager:
```
pip install -r requirements.txt
```
Download the Netflix dataset and place it in the project directory.
Run the project, following the Jupyter notebooks or Python scripts in the specified order to execute each project phase.

Usage

Execute the notebook to go through each project phase.
Review the visualizations and insights generated at each phase.
Analyze the results and gain valuable knowledge from the Netflix dataset.

Authors

Mehrnaz Sadeghieh, Helia Ghahraman

Thank you for exploring our Netflix Data Mining Project. We hope the insights and analyses provided here contribute to your understanding of the dataset and data mining techniques.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Netflix_Data_Mining_Final_Project.ipynb		Netflix_Data_Mining_Final_Project.ipynb
README.md		README.md
netflix.csv		netflix.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Mining-Second-Project

Overview

Project Phases

1. Data Preprocessing

2. Director Comparison

3. Frequent Pattern Extraction

4. Description Vectorization

5. Clustering

6. Classification

Dataset

Prerequisites

Installation

Usage

Authors

About

Releases

Packages

Languages

MehrnazSadeghieh/Netflix-Data-Mining-Project

Folders and files

Latest commit

History

Repository files navigation

Data-Mining-Second-Project

Overview

Project Phases

1. Data Preprocessing

2. Director Comparison

3. Frequent Pattern Extraction

4. Description Vectorization

5. Clustering

6. Classification

Dataset

Prerequisites

Installation

Usage

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages