This repository contains a project for predicting student scores based on various input features using machine learning techniques. It provides an end-to-end solution, from data preprocessing to model evaluation, for score prediction.
- Project Overview
- Features
- Technologies Used
- Setup and Installation
- Usage
- Project Structure
- Key Scripts
- Contributing
- License
The goal of this project is to develop a machine learning model that can predict the scores of students based on input features like study hours, attendance, or other relevant factors. The project demonstrates data preprocessing, feature engineering, model training, and evaluation.
- Data preprocessing, including handling missing values and feature scaling.
- Exploratory data analysis (EDA) with visualizations.
- Training various machine learning models and evaluating their performance.
- Hyperparameter tuning for optimized predictions.
- Easy-to-follow project structure and documentation.
- Web interface for inputting student data and predicting scores.
- Python: Core programming language for the project.
- Pandas: For data manipulation and analysis.
- NumPy: For numerical computations.
- Matplotlib and Seaborn: For data visualization.
- Scikit-learn: For machine learning model implementation and evaluation.
- Flask: For building the web application.
- CatBoost and XGBoost: For advanced machine learning models.
- Python 3.8 or higher
- Conda or pip for managing Python packages
-
Clone the repository:
git clone https://github.com/rahulpoojith/Student-Score-prediction.git cd Student-Score-prediction
-
Create a virtual environment using Conda:
conda create -n student_score_env python=3.8 -y conda activate student_score_env
-
Install the required packages:
pip install -r requirements.txt
-
Run the project files:
python app.py
- Prepare your dataset by placing it in the
data/
folder. Ensure it follows the format expected by the scripts. - Train the preprocessor and model if not already present in the
artifacts/
directory. - Run
app.py
to start the Flask web application:python app.py
- Open your browser and navigate to:
http://127.0.0.1:5001/
- Enter the student details (e.g., gender, reading score) and get the predicted math score.
Student-Score-prediction/
├── app.py # Main Flask application
├── application.py # Additional app functionality (if any)
├── data/
│ └── student_scores.csv # Placeholder for the dataset
├── notebooks/
│ ├── EDA.ipynb # Jupyter notebook for exploratory data analysis
│ ├── model_training.ipynb # Jupyter notebook for training the model
│ ├── catboost_info/ # Directory for CatBoost-specific logs and files
│ │ ├── learn/ # Contains training logs and events
│ │ │ ├── events.out.tfevents # TensorFlow event logs
│ │ ├── tmp/ # Temporary files generated during training
│ │ ├── catboost_training.json # CatBoost training configuration
│ │ ├── learn_error.tsv # Training error logs
│ │ ├── time_left.tsv # Time left for training
│ └── Data/ # Additional data files for notebooks
├── src/
│ ├── components/ # Modules for data ingestion, transformation, and model training
│ │ ├── data_ingestion.py # Data ingestion module
│ │ ├── data_transformation.py # Data transformation module
│ │ ├── model_trainer.py # Model training module
│ ├── pipelines/ # Prediction and training pipelines
│ │ ├── predict_pipeline.py # Prediction pipeline
│ │ ├── train_pipeline.py # Training pipeline
│ ├── exception.py # Custom exception handling module
│ ├── logger.py # Logging module for tracking events
│ ├── utils.py # Utility functions for common tasks
├── artifacts/
│ ├── preprocessors.pkl # Preprocessing pipeline
│ ├── model.pkl # Trained machine learning model
├── static/ # CSS, JS, or other assets (optional)
├── templates/
│ ├── index.html # Landing page
│ ├── home.html # Prediction form
├── logs/ # Directory for storing log files
├── MLprojectAWS.egg-info/ # Metadata and dependency files
│ ├── dependency_links.txt # Dependency links
│ ├── PKG-INFO # Package metadata
│ ├── requires.txt # Required dependencies
│ ├── SOURCES.txt # Source files list
│ ├── top_level.txt # Top-level package names
├── requirements.txt # List of dependencies
├── setup.py # Python package setup file
├── config.py # Configuration file for parameters
└── README.md # Project documentation
The core application script that:
- Sets up Flask routes for the home page and predictions.
- Loads the trained model and preprocessing pipeline.
- Processes input data and returns predictions.
A Python packaging file for installing the project as a package.
Specifies the dependencies required to run the project.
Modules for handling key processes:
data_ingestion.py
: Handles loading and preprocessing raw data.data_transformation.py
: Applies scaling and encoding to data.model_trainer.py
: Trains and saves machine learning models.
Predefined pipelines for:
predict_pipeline.py
: Handling data input and prediction workflow.train_pipeline.py
: Training the model and saving artifacts.
logs/
: Stores log files generated during execution, helping with debugging and monitoring.
catboost_info/
: Contains logs, configurations, and error tracking files generated by CatBoost.learn/
: Stores TensorFlow event logs from CatBoost training.tmp/
: Temporary files used during the training process.catboost_training.json
: Configuration file for the CatBoost training.learn_error.tsv
: Error logs during model training.time_left.tsv
: Estimated time left for training completion.
Data/
: Holds additional data files used in Jupyter notebooks.
- Contains metadata and dependency information for the project.
dependency_links.txt
: Additional dependency sources.PKG-INFO
: Package metadata.requires.txt
: Lists dependencies.SOURCES.txt
: List of source files.top_level.txt
: Top-level package names.
Contributions are welcome! To contribute:
- Fork the repository.
- Create a new branch for your feature or bugfix:
git checkout -b feature-name
- Commit your changes:
git commit -m "Description of changes"
- Push to your branch:
git push origin feature-name
- Open a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
AWS deplyment