This project focuses on data mining and analysis using the Netflix dataset. The project comprises several key phases, including data preprocessing, director comparison, frequent pattern extraction, description vectorization, clustering, and classification. The goal is to extract valuable insights and knowledge from the dataset.
- Load the Netflix dataset.
- Clean the data using common data preprocessing techniques.
- Compare directors based on various attributes.
- Visualize the comparison results to gain insights into director performance.
- Identify frequent patterns in the dataset, focusing on relationships between cast, director, and genres.
- Utilize BERT (Bidirectional Encoder Representations from Transformers) to vectorize the description column.
- Prepare the data for clustering.
- Perform clustering on the vectorized descriptions.
- Visualize the clustering results in both 2D and 3D dimensions.
- Label the descriptions and split the data into training and testing sets.
- Perform classification on the description column, addressing the challenge of multi-genre categorization:
- Scenario 1: Consider the first genre as the primary one for classification.
- Scenario 2: Treat the combination of all genres as a single genre (due to practical constraints).
To replicate the project, you can download the Netflix dataset from the provided files uploaded in this repository.
Before running this project, ensure you have the following:
- Python installed on your system.
- Required Python libraries and dependencies installed (specified in the project's requirements file).
-
Clone this repository or download the source files.
-
Install the necessary Python packages using
pip
or your preferred package manager:pip install -r requirements.txt
-
Download the Netflix dataset and place it in the project directory.
-
Run the project, following the Jupyter notebooks or Python scripts in the specified order to execute each project phase.
-
Execute the notebook to go through each project phase.
-
Review the visualizations and insights generated at each phase.
-
Analyze the results and gain valuable knowledge from the Netflix dataset.
Mehrnaz Sadeghieh, Helia Ghahraman
Thank you for exploring our Netflix Data Mining Project. We hope the insights and analyses provided here contribute to your understanding of the dataset and data mining techniques.