This project demonstrates a machine learning workflow that provides a structured approach to building, training, and deploying machine learning models.
Aim of this project is to develop a ml system to predict the behavior of customers as to retain customer.
The provided dataset can be found at: Telco Customer Churn
In this repository: data-source/Telco-Customer-Churn.csv
About Telcom Customer Churn Dataset
Each row represents a customer, each column contains customer’s attributes described on the column Metadata.
The raw data contains 7043 rows (customers) and 21 columns (features).
The “Churn” column is our target.
- python3.8 (or any other version)
- python3.8-venv (venv of that version)
To run this project, first install virtual environment:
python3 -m venv venv
Then to activate the Python virtual environment, run the following command:
source venv/bin/activate
Then run the following code to make sure python has path to virtual environments python:
export PYTHONPATH=~/path_to_directory/model-training-with-modular-workflow
Make sure to replace path_to directory with your local path.
To install all packages and initialize the project, run the following command:
pip install -e .
This will run the setup.py file to initialize the project and save metadata. Make sure you have requirements.txt
,
README.md
file ready.
To set up the folder structure run the folder_structure_setup.sh
. Run:
bash folder_structure_setup.sh
This will generate a folder structure necessary for modular workflow. See below:
├── data-source
├── src
│ ├── __init__.py
│ ├── components
│ │ ├── __init__.py
│ │ ├── data_ingestion.py
│ │ ├── data_transformation.py
│ │ ├── model_monitoring.py
│ │ └── model_trainer.py
│ ├── exception.py
│ ├── logger.py
│ ├── pipelines
│ │ ├── __init__.py
│ │ ├── prediction_pipeline.py
│ │ └── training_pipeline.py
│ └── utils.py
├── .gitignore
├── main.py
├── app.py
├── EDA.ipynb
├── README.md
├── requirements.txt
├── folder_structure_setup.sh
├── test-logging-integration.py
└── test-request.py
Machine Learning training and development phase can be divided into 4 steps:
-
Data Ingestion: In this step raw data is taken from data sources (e.g database, warehouse etc) and preprocessed and split into training, test and validation sets.
-
Data transformation: This is stage for data exploration, data cleaning, feature engineering. It takes raw data from data ingestion stage and creates featured data for model training.
-
Model Training: This stages takes the featured data from data transformation stage and trains models using the data. This stage work is to select architecture for model continuously train, tune a model. The models is the output of this stage.
-
Model Evaluation: This stage compares the trained model to select the best of them. Prepares the model for deployment.
DVC is used for tracking data files and ensuring version control for datasets.
dvc init
dvc add artifacts/data_ingestion/raw.csv
The feature store is managed using Feast, allowing storage and retrieval of features.
cd feature_repo
feast ui
MLflow is used to track experiments and visualize metrics.
mlflow ui
Execute the training pipeline and track the experiment metrics on MLflow.
Ensure the MLflow server is running before executing:
python3 src/pipelines/training_pipeline.py
A Flask application is provided for serving the model.
python3 app.py
Test the API using the provided test file.
python3 test-request.py
Status Code: 200
Response: {'churn_category': 'No', 'prediction': 0, 'status': 'success'}
Run flask in dev mode:
- first change virtual environment to v-fast:
python3 -m venv v-fast
source v-fast/bin/activate
- Install required packages
pip install -r fast-requirements.txt
- Run the App in dev mode:
fastapi dev main.py
You will see the app at https://localhost:8000
The create api for ml model inference is https://localhost:8000/predict
Use docker to build a image to deploy:
docker build -t flask-ml:0 .
Run the container in detach mode to deploy:
docker run -d -p 8000:8000 flask-ml:0