Tutorial: MLOps - Automation of Model Evaluation

Authors: Oscar Almqvist and Eric Vickström

Kadacoda is an interactive way of conducting the tutorial in the browser, found here.

This tutorial aims to automate the evaluation of a Machine Learning repository using webhooks on GitHub. The process of evaluating the effect of particular on the training set is tedious, especially if you weren't the author. This tutorial will teach how you automate the testing process for a specific pull request assigned to a certain label, and comment results said pull request.

Showcase

Overview

Prerequisites

A GitHub account
An ngrok account
A public GitHub repository containing a machine learning model. We have included our own example project which you can add to your repository

All of the code in this tutorial can be found in inside the code folder, which includes code for both the server and the small machine learning project; all written in Python 3.9.4.

Tutorial Outline

The generel steps to complete our goal is described below. In further sections, there will be more detailed instructions.

A server that listens on pull request events.

For every event on GitHub, you have the option to specify an HTTP endpoint where you want to retrieve data regarding the event. In our case, we want to listen to the event of setting a label on a pull request event. To listen, we need to create a webserver than listens for a PUSH event for a specific endpoint. To be able to specify which endpoint and later on access the GitHub API, we need to install our own GitHub App on the specific repository.
Evaluate the model inside the pull request.
The data in event contains the necessary information to evaluate the changes in the pull request. With this, we will clone the repository, evaluate and compare both the HEAD and the BASE of the pull request. Then we train and test the model against the test sets for each version of the code. This will be done by executing shell-commands via Python.
Comment the results
The result from the evaluation will be sent as comment on a pull request via the GitHub API. To do this, we need to authenicate via our GitHub App.

The Machine Learning Project Structure

In this tutorial we're using Python along with the Keras and Tensorflow libraries to create a simple model to classify digits from the MNIST dataset. This can of course be modified to your own setup, but for the sake of the tutorial, we have included a folder containing a model located here. This includes instructions on how to run it. For simplicity sake, we download the MNIST using keras.datasets.

Essentially, the included code does the following steps:

Downloads the dataset,
Preprocesses the data,
Creates the model,
Compiles the model,
Trains the model,
Evaluates it.

After all of these steps are done, it saves the result to a file. In our case, it is saved as result.txt with a JSON object containing the loss and accuarcy.

# result.txt
{'loss': 1, 'accuracy': 1}

⚠️ Remember that the results file produced from the model must match the file that the server is supposed to read from! ⚠️

If you don't have a repository where you want to implement this, create a new repository and copy the content of our small demo. The demo.py and requirements.txt should be at the root of the repository.

Additionally, add an evaluate label to the repository as this will be our flag for showing when to run our evaluation.

An example of how our project looks like:

Required Dependencies

This has been tested on ubuntu-20.04 with the following Python modules:

For the server:
flask=1.1.2
python-dotenv=0.17.0
cryptography=3.4.7 pyjwt=2.1.0

For the machine learning project:
numpy=1.20.2
tensorflow=2.4.1

To install the latest versions of the dependencies, either use pip3 install <module> for each module, or use the included requirements.txt for the server and the machine learning project. This can be installed via pip3 install -r <requirements file>. We recommend using some form of python environment manager, for instance, Conda.

A server that listens on pull request events

Before telling GitHub, which endpoint we expect the PUSH event to be sent to, we need to start listen on that specific endpoint. To listen, we will make use of the webframework Flask. First, we create a webserver in a file called server.py that listens on an arbitrary port, lets say 1337, and that expects a POST-request on endpoint /mlops-server. This is the content:

# server.py
from flask import Flask, request

app = Flask(__name__)

@app.route('/mlops-server', methods=['POST'])
def mlops_server_endpoint():
    request_data = request.get_json()
    return 'Awaiting POST'
    
if __name__ == '__main__':
    app.run(debug=False, port=1337)

By running python3 server.py, you start the webserver. Now the server awaits a POST request at http://localhost:1337/mlops-server. The POST request will contain JSON data in its body, which will contain all the data belonging to a pull request event. Lets get GitHub to send us these events to our endpoint.

Port Forwarding ️

The port 1337 is probably not open to the public (otherwise I recommend you to see over your security settings). Proper network configuration is beyond the scope of this tutorial as it is dependent on your server, so for debugging purposes, we and GitHub Docs recommend ngrok. The idea is that ngrok forwards the request to your local server and thus the idea of opening ports becomes a non-issue. The installation depends on your system, so we recommend following the instructions on ngrok download. You need to register, get and link your authenication token and forward the port 1337 via the command./ngrok http 1337 (for Linux).

In our case, the endpoint our GitHub App will send its requests to, is http://afae4b0b1670.ngrok.io/mlops-server.

Storing environment variables

Communicating with an API requires authenication and IDs specific to your project. The various keys and IDs we will collect in the coming parts should not be uploaded to any public repository. We will handle these by a creating an environment file called .env. The data we need to store there is the app ID, the install ID, and the path to the private key.

# .env
APP_ID=<OUR APP ID>
PRIVATE_KEY_PATH=<OUR PATH TO PRIVATE KEY>
INSTALL_ID=<OUR INSTALL ID>

To then access these variables, we will make use of the dotenv module.

# server.py
import os
from dotenv import load_dotenv

load_dotenv()

INSTALL_ID = os.getenv('INSTALL_ID')
APP_ID = os.getenv('APP_ID')
PRIVATE_KEY_PATH = os.getenv('PRIVATE_KEY_PATH')

Register a GitHub App

We need to create a GitHub App and install in on our machine learning repository. But why? 🧐 Webhooks can be configured without an app, however commenting on our pull request requires us to communicate via GitHub. So lets create an App. First navigate to their app page.

Click New GitHub App.

We need to specify a name for our app. However, they require you to specify a website. No website? Here we could simply input the ngrok url http://afae4b0b1670.ngrok.io/. We are using the default settings for all the options (Identifying and authorizing users, Post installations, etc.), expect for the following:

Webhook

Here, we will input the URL we recieved by ngrok with the added /mlops-server extension.

Repository permissions

Tell GitHub, that we need access to every pull-request where we install our App.

Subscribe to events

For the webhook, we want to listen to specific request regarding pull-requests.

Where to install

We do not plan to install this everywhere and only for our repo.

Click create. Voilá, we have our first App. 🥳

Save the App ID inside .env

# .env
APP_ID=347128

Get private key

In the future, we need to authenticate as the app through our server, so generate a private-key! This is found at the bottom of the same page.

Download it and keep it safe! Save the path to the key to .env.

# .env
APP_ID=347128
PRIVATE_KEY_PATH=path/to/key.pem

Install the app

Scroll up, and click on Install App.

Choose that you want to install the app for one of your repositories.

Here, select your machine learning repository. Hopefully, your project name is more exciting that your-ml-project.

Click on Install.

Look 👀 See the installation ID in the url? This ID is used when we want to specify which repository we want to add a comment to. Save that as well.

# .env
APP_ID=347128
PRIVATE_KEY_PATH=path/to/key.pem
INSTALL_ID=92312521

Wow, that was a lot of steps.😮‍💨 The good news are that we're done with registrering the GitHub App. Let's continue with our server!

Evaluating our Pull Requests

Okay, now we have our server that tells us when someone has created a pull request and we have a model. Let's combine the two; train & evaluate the model when someone has created a pull request!

The plan is to have the server clone the repository, checkout the latest commit, install all the dependencies, evaluate the model against the base. To do this, we can run shell-commands via Python. To do this, we utilize the module subprocess and the method run(). First, we clone and change the folder name to an arbitrary string project_dst, with the help of module uuid; the reason for this is to avoid collisions with folders that already exist. Next, we checkout a specific version of the cloned repository (with cwd we change our current directory inside the project_dst folder). Then, we install the dependencies of the machine learning project, and test the model expecting a result.txt. For future pull requests, we can't use the same cloned directory, so we move the directory to the 🗑️ ️️

# server.py
import subprocess

def evaluate_pull_request(commit_sha, html_url):
    project_dst =  uuid.uuid4().hex
    subprocess.run(["git", "clone", html_url, project_dst])
    subprocess.run(["git", "checkout", commit_sha], cwd=f"./{project_dst}")
    subprocess.run(["pip", "install", "-r", f"{project_dst}/requirements.txt"])
    subprocess.run(["python3", f"{project_dst}/demo.py"])
    subprocess.run(["rm", "-rf", project_dst])
    
    with open(f"result.txt", "r") as f:
        return json.load(f)

Notice how we depend on the project layout via the paths here ({project_ds}/requirements.txt and {project_dst}/demo.py). If you use your own project, make sure you call the correct files!

If you want to supress the output from run() you can add redirect the stdout and/or stderr to subprocess.DEVNULL. See server.py for how we did it!

What data from the pull request do we need? Well, we need the html_url of the repository in order to clone it and then commit_sha in order to checkout the latest changes. Note that we need the html_url and commit_sha of both the head and base as we want to compare the changes. Additionally, we need the comments_url to have our application comment the results on the pull request. Let's create two utility functions for this:

# server.py
def get_commits(data):
    return data["pull_request"]["head"]["sha"], \
           data["pull_request"]["base"]["sha"]

def get_urls(data):
    return data["pull_request"]["comments_url"], \
           data["pull_request"]["head"]["repo"]["html_url"], \
           data["pull_request"]["base"]["repo"]["html_url"]

We also need to specify when this could be run. For instance, if we just made a pull request that updated documentation or similar, we don't need to run all of this testing as the model hasn't been changed. To solve this, we will create a label in our repository which is treated as a flag for letting our server know when to evaluate it. We will call this label evaluate. This also means that we need to create some form of validation function that asserts if it's a valid response intended for testing and comparing the model. In the case of GitHub webhooks, we want to listen of the labeled action for a pull_request. We also need to see if it contains our label.

# server.py
def is_valid_response(data):
    is_valid = False
    keys = data.keys()
    if 'pull_request' in keys and 'action' in keys:
        if data['action'] != 'labeled':
             return False 
        
        for label in data['pull_request']['labels']:
            if 'evaluate' == label['name']: # 'evaluate' corresponds to said label
                is_valid = True 

    return is_valid

Now, let's glue everything together inside mlops_server_endpoint().

# server.py
from flask import Flask, request

@app.route('/mlops-server', methods=['POST'])
def mlops_server_endpoint():
    response = request.get_json()

    if is_valid_response(response):
        sha_head, sha_base = get_commits(response)
        comments_url, url_head, url_base = get_urls(response)

        head_result = evaluate_pull_request(sha_head, url_head)
        base_result = evaluate_pull_request(sha_base, url_base)

        # TODO: send comment with results to pull request

    return 'Awaiting POST'

Commenting a Pull Request

In section 1, the communication with the repository was rather one-sided; the server could only listen to Webhook events. In order to send requests to our repository, we need to add functionality. First of all, we need to fetch an access token. The purpose of the access token is authenticate against GitHub. This is done by constructing a JSON Web Token (JWT) based on the app ID and private key from section 2.

Let's generate our JWT. The different time fields (iat, exp) represent for how long this should be valid in terms of seconds (?).

# server.py
import jwt
import time

def generate_jwt():
    pemfile = open(PRIVATE_KEY_PATH, 'r')
    key = pemfile.read()
    pemfile.close()
    payload = {
        "iat": int(time.time() - 60),
        "exp": int(time.time() + (10 * 60)) - 10,
        "iss": APP_ID
    }
    return jwt.encode(payload, key, algorithm="RS256")

Using generate_jwt() we create a function for fetching the access token, which essentially sends a POST request to https://api.github.com/app in order to fetch an access token for a certain GitHub app.

# server.py
import json

GITHUB_APP_URL = "https://api.github.com/app"

def get_token():
    headers = {
        "Authorization": f"Bearer {generate_jwt()}",
        "Accept": "application/vnd.github.v3+json"
    }
    r = requests.post(f"{GITHUB_APP_URL}/installations/{INSTALL_ID}/access_tokens", headers=headers)
    return r.json()["token"]

To later post a message on a certain pull request, we use the access token to send a POST request with a body containing our message.

# server.py
def post_message_on_pull_request(comments_url, token, message):
    headers = {
        "Authorization": f"token {token}",
        "Accept": "application/vnd.github.v3+json"
    }
    payload = {
        "body": message
    }

    requests.post(comments_url, headers=headers, data=json.dumps(payload))

Connecting the dots

Finally, we can assemble everything to evaluate a model from a certain pull request and send the results! If we would like, we could also format our message to make them more readable, kind of like this:

Source	Loss	Accuracy
Head	1.07	78.0%
Base	1.06	80.0%
Diff	0.01	-2.0%

Let's create a utility function that creates a Markdown table with the data based on the loss and accuracy from what the pull request contains (head) and to where it's going (base).

# server.py
def format_markdown_comment(head, base):
    diff_loss = round(head['loss'] - base['loss'], 2)
    diff_acc = round(head['accuracy'] - base['accuracy'], 2)
    rows = [
        f"| Source| Loss           | Accuracy            |",
        f"| ------| ---------------| ------------------- |",
        f"| Head  | {head['loss']} | {head['accuracy']}% |",
        f"| Base  | {base['loss']} | {base['accuracy']}% |",
        f"| Diff  | {diff_loss}    | {diff_acc}%         |"
    ]
    return "\n".join(rows)

Now, add these to the mlops_server_endpoint().

# server.py
@app.route('/mlops-server', methods=['POST'])
def mlops_server_endpoint():
    response = request.get_json()

    if is_valid_response(response):
        sha_head, sha_base = get_commits(response)
        comments_url, url_head, url_base = get_urls(response)

        head_result = evaluate_pull_request(sha_head, url_head)
        base_result = evaluate_pull_request(sha_base, url_base)

        message = format_markdown_comment(head_result, base_result)

        token = get_token()
        post_message_on_pull_request(comments_url, token, message)
    return 'Awaiting POST'

Conclusion

Start your server python3 server.py.

Now, if you branch out from your own repository and make some change (either some parameter in the model or just some text change), you can then open a pull request and add an evaluate label. You should see a bunch of output on the server for every command that it's running, but finally you should see a comment on your pull request! This tutorial is proof-of-concept of how one could build a server to evaluate pull requests containing ML models. This can hopefully be altered to your needs and inspire to automate more parts of your development process!

If you get a 404 on the server, it's most likely due to the wrong URLs specified inside the app settings. To solve this, click Edit on your app. In the Webhook URL must have this url: http://[[HOST_SUBDOMAIN]]-1337-[[KATACODA_HOST]].environments.katacoda.com/mlops-server.

Appendix A - Example payload

An example subset of what a pull request event can contain. If you can't wait and are interested in seeing a full payload from a request, you may visit the GitHub docs. You can also read more about webhooks here.

{
  "action": "opened",
  "number": 3,
  "pull_request": {
    "comments_url": "https://api.github.com/repos/vickstrom/your-ml-project/issues/3/comments",
    "head": {
      "sha": "89243d3490e9c0djk3as8791b84bc05a42837a363a",
      "repo": {
        "html_url": "[email protected]:vickstrom/your-ml-project.git",
      }, 
    },   
    "base": {
      "sha": "45as43d3490e9c0djk3as8791b84bc05a42837a363a",
      "repo": {
        "html_url": "[email protected]:vickstrom/your-ml-project.git",
      }, 
    },   
    "labels": [{
        "name": "Evaluate",
    }],
  }
}

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
code		code
images		images
tutorials		tutorials
LICENSE		LICENSE
README.md		README.md
katacoda.yaml		katacoda.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tutorial: MLOps - Automation of Model Evaluation

Showcase

Table of Contents

Overview

Prerequisites

Tutorial Outline

The Machine Learning Project Structure

Required Dependencies

A server that listens on pull request events

Port Forwarding ️

Storing environment variables

Register a GitHub App

Webhook

Repository permissions

Subscribe to events

Where to install

Get private key

Install the app

Evaluating our Pull Requests

Commenting a Pull Request

Connecting the dots

Conclusion

Appendix A - Example payload

About

Releases

Packages

Contributors 2

Languages

License

vickstrom/automation-of-model-evaluation

Folders and files

Latest commit

History

Repository files navigation

Tutorial: MLOps - Automation of Model Evaluation

Showcase

Table of Contents

Overview

Prerequisites

Tutorial Outline

The Machine Learning Project Structure

Required Dependencies

A server that listens on pull request events

Port Forwarding ️

Storing environment variables

Register a GitHub App

Webhook

Repository permissions

Subscribe to events

Where to install

Get private key

Install the app

Evaluating our Pull Requests

Commenting a Pull Request

Connecting the dots

Conclusion

Appendix A - Example payload

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages