Skip to content

A deployable End-to-End machine learning model to predict loan default risk of new loan request for a financial services platform

License

Notifications You must be signed in to change notification settings

jyothisable/Credit-Default-Risk-Prediction-Model

Repository files navigation

End-to-End ML Credit Default Risk Prediction Model

logo banner

Introduction

This project aims to develop an End-to-End machine learning model to predict loan default risk for a financial services platform. The model is trained on a dataset of borrower attributes, including financial, demographic, and credit-related features. The goal is to enable data-driven decisions in the loan underwriting process.

Tech used

PythonPandasNumPyMatplotlibPlotlyScikit-LearnMLFlowOptunaFastAPIStreamlit

Key Features

  • Predictive modeling using machine learning algorithms
  • Feature engineering and selection with sklearn pipelines
  • Model evaluation and hyperparameter tuning with optuna
  • Model registry and tracking with MLFlow
  • Testing with pytest
  • Deployment of API with FastAPI
  • Frontend with streamlit App
  • Containerization with Docker

Installation / Environment Setup

Prerequisites

  • Python (>=3.7)
  • Poetry (>=1.8) see how to install here
  • Dependencies listed in pyproject.toml or requirements.txt (not to use conda.yaml for dependencies)

Installation Instructions

  1. Clone the repository:

    gh repo clone jyothisable/LoanTap-Credit-Default-Risk-Model
  2. Set up virtual environment and dependencies:

    In a terminal, navigate to the project directory and run the following commands:

    poetry install
    poetry shell

Project Structure

┣ πŸ“‚ data/ - Contains raw and processed data files for the project
┃ ┣ πŸ“„ LCDataDictionary.xlsx - Metadata dictionary for the loan dataset
┃ ┣ 🌍 US_zip_to_cord.csv - CSV mapping U.S. ZIP codes to geographical coordinates
┃ ┣ πŸ“Š loan.csv - Main dataset containing detailed loan information (not available in GitHub because of size limitation, downloaded from [Kaggle](https://www.kaggle.com/datasets/ranadeep/credit-risk-dataset))
┃ ┣ πŸ“‰ loan_reduced.csv - Filtered and reduced version of the main dataset
┃ ┣ πŸ§ͺ test_data.csv - Dataset used for evaluating model performance
┃ β”— πŸ“š train_data.csv - Dataset used for training machine learning models

┣ 🧩 Prediction_Model/ - Contains source code for model development, training, and predictions
┃ ┣ πŸ—ƒοΈ trained_models/ - Directory for storing trained models and pipelines
┃ ┃ ┣ 🧠 XBG_model_final.pkl - Final trained XGBoost model
┃ ┃ ┣ πŸ“Š fe_eval_model.pkl - Feature engineering evaluation model
┃ ┃ ┣ βš™οΈ fe_eval_tuned_model.pkl - Tuned model for evaluating feature engineering
┃ ┃ ┣ πŸ”„ fe_pipeline_fitted_final.pkl - Fitted feature engineering pipeline
┃ ┃ β”— πŸ”™ target_pipeline_fitted.pkl - Fitted target pipeline for reverse transformations after predictions
┃ ┣ πŸ“¦ __init__.py - Initialization file for package setup
┃ ┣ 🧩 FE_pipeline.py - Script for feature engineering pipeline configurations
┃ ┣ βš™οΈ config.py - Configuration file defining project parameters and settings
┃ ┣ 🧹 data_handling.py - Script for loading, cleaning, and managing datasets and pipelines
┃ ┣ πŸ§ͺ evaluation.py - Script for evaluating models and feature engineering pipelines
┃ ┣ πŸ“Š get_features.py - Utility for extracting features from the data
┃ ┣ πŸ“ˆ plotting.py - Script for generating plots and visualizations
┃ ┣ πŸ€– predict.py - Script for running predictions using trained models
┃ β”— πŸ‹οΈ train.py - Main script for training machine learning models

┣ πŸ“’ notebooks/ - Directory for Jupyter notebooks, images, and analysis reports
┃ ┣ πŸ–ΌοΈ Designer.jpeg - Image file for branding or presentation purposes
┃ ┣ πŸ“„ EDA_report.html - HTML report summarizing exploratory data analysis
┃ ┣ πŸ“Š LC.png - Additional image resource related to the project
┃ ┣ 🏷️ loantap_logo.png - Logo image for the project
┃ β”— πŸ§ͺ model_prototyping.ipynb - Jupyter notebook for exploratory data analysis and model prototyping

┣ πŸ§ͺ tests/ - Contains unit tests and integration tests to ensure code robustness
┃ ┣ πŸ“¦ __init__.py - Initialization file for the tests package
┃ ┣ πŸ” data_tests.py - Tests for data handling and processing functions
┃ β”— πŸ§ͺ test_prediction.py - Tests for the prediction module

┣ 🚫 .dockerignore - Specifies files and directories to ignore when building Docker images
┣ 🚫 .gitignore - File to exclude specific files and directories from Git version control
┣ βš™οΈ MLProject - Configuration for running MLflow projects
┣ πŸš€ fastapi_app.py - Script to run a FastAPI web application for serving models or APIs
┣ 🎨 streamlit_app.py - Script to run a Streamlit application for visualizing data and making predictions
┣ πŸ‹ Dockerfile - Instructions to build a Docker container for the project
┣ πŸ§ͺ conda.yaml - Environment configuration file for setting up dependencies of MLFlow (not to be used for project setup)
┣ πŸ“œ requirements.txt - List of required Python packages for the project (for use with pip)
┣ βš™οΈ pyproject.toml - Configuration file defining project dependencies and settings using Poetry
┣ πŸ”’ poetry.lock - Dependency lock file for consistent environment setup using Poetry
┣ πŸ“œ LICENSE.md - Legal license information for the project
┣ πŸ“˜ README.md - Documentation file with an overview, setup, and usage instructions

Usage

Important

Make sure to activate poetry by running poetry shell in the root directory before running any commands or add poetry run in beginning of every command below.

Training

To train the model, run the following command in the root directory:

python Prediction_Model/train.py # feature engg pipeline
python Prediction_Model/train.py # Traning pipeline

Prediction

To make predictions on test data, run the following command in the root directory:

python Prediction_Model/predict.py

Testing

To test the model, run the following command in the root directory:

pytest tests/test_prediction.py

Running Web Apps

fastAPI

python fastapi_app.py 

POST to localhost:8000/predict with Postman or use localhost:8000/predict/docs in browser for documentation / testing

Streamlit App

streamlit run streamlit_app.py # local   ```

Usage with docker

1. Pulling the Docker Image

To pull the Docker image from Docker Hub, run the following command:

# Pull the docker image
docker pull jyothisable/credit_risk_streamlit_app

2. Running the Docker Container

To run the Docker container, use the following command:

# Run the docker container
docker run -p 8501:8501 credit_risk_streamlit_app # goto http://localhost:8501 in browser

Dataset

Refer to here or data/LCDataDictionary.xlsx

Model

The model used for this project is an XGBoost classifier. The hyperparameters used for training are as follows after tuning with optuna:

{
    'max_depth': 9,
    'learning_rate': 0.094,
    'n_estimators': 507,
    'gamma': 0.0062,
    'subsample': 0.962,
    'colsample_bytree': 0.795,
    'lambda': 0.389,
    'alpha': 0.0233,
    'scale_pos_weight': 1.99,
    'min_child_weight': 2,
    'grow_policy': 'lossguide'
}

Results

The model achieved an f1 score of 79.12% and recall of 85.84% on the test dataset.

License

This project is licensed under the MIT License.

About

A deployable End-to-End machine learning model to predict loan default risk of new loan request for a financial services platform

Topics

Resources

License

Stars

Watchers

Forks

Languages