This document provides steps to follow when using this repository as a template to train models and deploy the models with real-time inference in Azure ML with your own scripts and data.
- Follow the MLOpsPython Getting Started guide
- Follow the MLOpsPython bootstrap instructions to create your project starting point
- Configure training data
- [If necessary] Convert your ML experimental code into production ready code
- Replace the training code
- Update the evaluation code
- Customize the build agent environment
- [If appropriate] Replace the score code
Follow the Getting Started guide to set up the infrastructure and pipelines to execute MLOpsPython.
Take a look at the Repo Details document for a description of the structure of this repository.
The Bootstrap from MLOpsPython repository guide will help you to quickly prepare the repository for your project.
Note: Since the bootstrap script will rename the diabetes_regression
folder to the project name of your choice, we'll refer to your project as [project name]
when paths are involved.
The training ML pipeline uses a sample diabetes dataset as training data.
To use your own data:
- Create a Dataset in your Azure ML workspace
- Update the
DATASET_NAME
andDATASTORE_NAME
variables in.pipelines/[project name]-variables-template.yml
The MLOpsPython template creates an Azure Machine Learning (ML) pipeline that invokes a set of Azure ML pipeline steps (see ml_service/pipelines/[project name]_build_train_pipeline.py
). If your experiment is currently in a Jupyter notebook, it will need to be refactored into scripts that can be run independantly and dropped into the template which the existing Azure ML pipeline steps utilize.
- Refactor your experiment code into scripts
- [Recommended] Prepare unit tests
Examples of all these scripts are provided in this repository. See the Convert ML experimental code to production code tutorial for a step by step guide and additional details.
The template contains three scripts in the [project name]/training
folder. Update these scripts for your experiment code.
train.py
contains the platform-agnostic logic required to do basic data preparation and train the model. This script can be invoked against a static data file for local development.train_aml.py
is the entry script for the ML pipeline step. It invokes the functions intrain.py
in an Azure ML context and adds logging.train_aml.py
loads parameters for training from[project name]/parameters.json
and passes them to the training function intrain.py
. If your experiment code can be refactored to match the function signatures intrain.py
, this file shouldn't need many changes.test_train.py
contains tests that guard against functional regressions intrain.py
. Remove this file if you have no tests for your own code.
Add any dependencies required by training to [project name]/conda_dependencies.yml]
. This file will be used to generate the environment that the pipeline steps will run in.
The MLOpsPython template uses the evaluate_model script to compare the performance of the newly trained model and the current production model based on Mean Squared Error. If the performance of the newly trained model is better than the current production model, then the pipelines continue. Otherwise, the pipelines are canceled.
To keep the evaluation step, replace all instances of mse
in [project name]/evaluate/evaluate_model.py
with the metric that you want.
To disable the evaluation step, either:
- set the DevOps pipeline variable
RUN_EVALUATION
tofalse
- uncomment
RUN_EVALUATION
in.pipelines/[project name]-variables-template.yml
and set the value tofalse
The DevOps pipeline definitions in the MLOpsPython template run several steps in a Docker container that contains the dependencies required to work through the Getting Started guide. If additional dependencies are required to run your unit tests or generate your Azure ML pipeline, there are a few options:
- Add a pipeline step to install dependencies required by unit tests to
.pipelines/code-quality-template.yml
. Recommended if you only have a small number of test dependencies. - Create a new Docker image containing your dependencies. See docs/custom_container.md. Recommended if you have a larger number of dependencies, or if the overhead of installing additional dependencies on each run is too high.
- Remove the container references from the pipeline definition files and run the pipelines on self hosted agents with dependencies pre-installed.
For the model to provide real-time inference capabilities, the score code needs to be replaced. The MLOpsPython template uses the score code to deploy the model to do real-time scoring on ACI, AKS, or Web apps.
If you want to keep scoring:
- Update or replace
[project name]/scoring/score.py
- Add any dependencies required by scoring to
[project name]/conda_dependencies.yml
- Modify the test cases in the
ml_service/util/smoke_test_scoring_service.py
script to match the schema of the training features in your data