GitHub - honatanpalma/data-cleaning-and-transformation: This repository provides a basic introduction to data cleaning and transformation techniques using Python. It includes a practical example with a student performance dataset, showcasing how to prepare data for analysis or machine learning.

Data Cleaning and Transformation

This project demonstrates the application of common data cleaning and transformation techniques on a student performance dataset. The goal is to prepare the data for further analysis or machine learning modeling.

Dataset Overview

The dataset was obtained from Kaggle: https://www.kaggle.com/datasets/haseebindata/student-performance-predictions and contains information about students, including their attendance rate, study hours, previous grades, extracurricular activities, parental support, and final grades.

Steps Performed

Handling Missing Values
- Although the original dataset contained no missing values, the script includes a step to handle them using df.dropna(). This ensures robustness in case future datasets have missing data.
Removing Duplicates
- Duplicate rows, if any, are removed based on the unique identifier StudentID. This prevents any bias or inaccuracies in the analysis due to redundant data.
Normalization
- The AttendanceRate, StudyHoursPerWeek, and PreviousGrade columns are normalized using MinMaxScaler from scikit-learn. This scales the features to a range of 0-1, making them comparable and potentially improving the performance of machine learning algorithms.
Encoding Categorical Variables
- Categorical variables Gender and ParentalSupport are transformed into numerical representations using one-hot encoding. This enables their inclusion in mathematical computations required for modeling.
Feature Engineering
- A new feature, GradeImprovement, is calculated by subtracting PreviousGrade from FinalGrade. This derived feature provides direct insight into the academic progress of each student.

Code and Implementation

The Python script utilizes the Pandas library for data manipulation and scikit-learn for preprocessing.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
preprocesing_code.py		preprocesing_code.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Cleaning and Transformation

Dataset Overview

Steps Performed

Code and Implementation

About

Releases 1

Languages

honatanpalma/data-cleaning-and-transformation

Folders and files

Latest commit

History

Repository files navigation

Data Cleaning and Transformation

Dataset Overview

Steps Performed

Code and Implementation

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Languages