This repository provides an example of dataset preprocessing, model training and evaluation, model tuning and finally model serving (REST API) in a containerized environment using MLflow tracking, projects and models modules.
This project contains an MLflow project that trains a GBRT (Gradient Boosted Regression Tree) model on UC Irvine's Bike Sharing Dataset Data Set and uses a Docker image to capture the dependencies needed to run training and inference code.
Talk: @PyDataRiyadh:
- https://twitter.com/PyDataRiyadh/status/1291043529146466304
- https://twitter.com/PyDataRiyadh/status/1314841078999154689
Talk slides: MLflow-presentation.pdf
Notebook: MLflow-example-notebook.ipynb
MLproject
specifies the Docker container environment to run the project and definescommand
andparameters
inentry_points
Dockerfile
used to build the image referenced by theMLproject
requirements.txt
: defined python dependencies needed to build training and inference docker imagemlflow_project_driver.py
: creates an MLflow experiment for model training and tuning and launches MLflow runs in parallel in docker containers.mlflow_model_driver.py
: finds best training run and starts a REST API model server based on MLflow Models in docker containers.train.py
: contains a file that trains a scikit-learn model and uses MLflow Tracking APIs to log the model and its metadata (e.g., hyperparameters and metrics)data/hour.csv
: Bike Sharing Dataset Data Set
Prerequisites:
- Python 3
- Install Docker per instructions at https://docs.docker.com/install/overview/
- Install mlflow
pip install mlflow
- clone this repo:
git clone https://github.com/alfozan/mlflow-example
- build the image for the project's Docker container environment:
docker build -t mlflow_example -f Dockerfile .
- Start training and tracking:
python3 mlflow_project_driver.py
In the same repo directory, run mlflow ui --host 0.0.0.0 --port 5000
UI is accessible at http://localhost:5000/
In the same repo directory, run python3 mlflow_model_driver.py
curl --silent --show-error 'http://localhost:5001/invocations' -H 'Content-Type: application/json' -d '{
"columns": ["season", "year", "month", "hour_of_day", "is_holiday", "weekday", "is_workingday", "weather_situation", "temperature", "feels_like_temperature", "humidity", "windspeed"],
"data": [[1, 0, 1, 0, 0, 6, 0, 1, 0.24, 0.2879, 0.81, 0.0000]]
}'