✨ Build MLflow Tracking Server for MLOps Discovery #4275

PriyaBasker23 · 2024-05-08T06:01:28Z

Describe the feature request.

Implement a fully managed MLflow tracking server on the AWS platform help in discovery of machine learning operations within MOJ.

Details:

Backend Store: Utilise Amazon RDS to store MLflow metadata and logs securely.
Artifact Storage: Use Amazon S3 for storage of machine learning models and artifacts.
Tracking Server: Deploy an EC2 instance or Docker container to host the MLflow tracking server, enabling remote access

Other Requirements

Access to the tracking server should be secured through a login system.
Only authorized individuals should be able to access the experiments.
Data in the artifacts bucket should be organised into specific folders based on user identity. These folders should be accessible only to alpha users who have the necessary permissions. Implementing folder-level permissions for alpha users can be implemented from the control panel . This setup allows users to store artifacts generated from their code execution in Visual Studio in AP.

Details information are available at https://github.com/moj-analytical-services/mlops/blob/main/docs/mlflow/mlflow_tracking_server.md

Describe the context

MLflow Tracking is a component of the MLflow platform that enables data scientists and machine learning engineers to track and log experiments during the model development process. With MLflow Tracking, users can easily record parameters, metrics, and output files from their machine learning experiments, making it easier to organize and compare different approaches. It provides a centralized location to store experiment results, allowing for efficient collaboration and reproducibility. MLflow Tracking also offers a user-friendly interface for visualizing experiment results, enabling users to gain insights into model performance and make informed decisions about model improvements.

Value / Purpose

This configuration will enable data scientists to centralise their experimental data, streamlining access to experiments for all team members. It will facilitates the ability for data scientists to integrate and test MLflow from their existing projects within Visual Studio, using the Application Platform

User Types

Data Scientist

mshodge · 2024-05-09T10:36:40Z

Hi team, you might not be able to answer this right away, but for our own MLOps work and planning, it would be really good to know the timescales you might this be deliverable over. Even, what timescales you could start to explore it, whether that's days/weeks/months away. Thank you.

mshodge · 2024-05-15T14:56:27Z

Hi @Ed-Bajo could we set some timescales for this? I'm working with the Probation and Electronic Monitoring team and we'd like it to be available for testing and use soon. Is the end of June a feasible timescale to deliver to? Thanks. Michael

bcrawford-moj · 2024-05-16T12:51:20Z

This feature would be extremely useful for the BOLD AI for Linked Data team. We currently have no good way to track ML experiments and this would be a great step towards industry best practice. We'd like to see it as soon as is possible as we are a time limited programme.

jacobwoffenden · 2024-06-10T21:39:42Z

10/06/24 summary:

KMS keys, RDS PostgreSQL, S3 bucket, IAM policy, IAM role (IRSA enabled) and Kubernetes namespace created
- 🤖 MLflow on APC modernisation-platform-environments#6510
https://artifacthub.io/packages/helm/community-charts/mlflow is 12 minor versions behind and not officially support by MLflow
- Have started working on a very lightweight Helm chart
MLflow doesn't support anything other than basic-auth, there is currently not external IdP support
MLflow container runs as root and doesn't include Prometheus exporter package
- I have a working prototype

jacobwoffenden · 2024-06-11T16:42:12Z

11/06/24 summary:

2.13.2-rc0 released https://github.com/ministryofjustice/analytical-platform-mlflow/releases/tag/2.13.2-rc0
Testing shows MLflow cannot share the same database for both authentication and backend, that is fine, however we lack the ability to programatically create databases in MP CI/CD, so have created another RDS instance for now, will explore an initContainer/schema migration tools such as ariga's atlas to do this

jacobwoffenden · 2024-06-12T18:02:28Z

12/06/24 summary:

https://mlflow.compute.development.analytical-platform.service.justice.gov.uk/ is running
Initial thoughts on management of permissions is that its quite cumbersome using the REST APIs and could really do with a wrapper (i.e. AP UI)
- https://mlflow.org/docs/2.13.2/auth/index.html#how-it-works
- https://mlflow.org/docs/2.13.2/rest-api.html

jacobwoffenden · 2024-06-12T18:10:40Z

Moving to blocked while discussed way forward with Analytical Platform Product Management

jacobwoffenden · 2024-06-13T10:58:47Z

Notes:

Alpha users would need access to the S3 bucket to retrieve models, this in theory is OK, but we'd need to mutate an Alpha users permissions based on what experiments and models they are allowed to access

mshodge · 2024-06-13T12:47:16Z

Solution one: users set their own artifact location when creating experiment

One solution is that users can define their own artifact location in MLFlow at the create experiment level (https://mlflow.org/docs/latest/rest-api.html#create-experiment) meaning they can direct artifacts to be stored at their own buckets anyway - but not sure how this works with access between MLFlow and that bucket? I will test this with the running server and see what error it gives.

Solution two: wrapper and AP control panel can be used to create experiments and assign S3 perms

There seems to be some circularity brewing with the process in that:

User gets access to MLFlow and has user permissions added
User creates an experiment and run using code and UI which pushed model artifacts to S3 bucket folder
User needs permissions to use the model artifacts from S3 bucket outside of MLFlow

If in someway 1 can be done using their alpha user name somehow then we need a way of making sure if they make a new experiment, this then is linked back to their alpha user name for the S3 perms.

A solution might be to force users to use the AP Control Panel for creating experiments through the MLFlow API (https://mlflow.org/docs/latest/rest-api.html#create-experiment) instead of them creating them through code or the UI (although not sure how we really can prevent this :/) as then the api wrapper can also do the S3 perms at the same time at the artifact level.

jacobwoffenden · 2024-06-17T09:09:01Z

@gfowler-moj is going to put a session in to review the way forward around authentication/permissions management

jacobwoffenden · 2024-06-20T15:33:58Z

Outcome of meeting:

Switch to using Alpha bucket for artefacts
Make Priya and Michael admins

jacobwoffenden · 2024-06-20T15:50:46Z

I have created a group in Control Panel (analytical-platform-mlflow-admins), added @mshodge and @PriyaBasker23, and create 3 artefact buckets:

alpha-analytical-platform-mlflow
alpha-analytical-platform-mlflow-development
alpha-analytical-platform-mlflow-test

jacobwoffenden · 2024-06-20T16:11:54Z

TODO:

jacobwoffenden · 2024-06-24T11:36:53Z

alpha-analytical-platform-mlflow-development updated with below JSON

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "DenyInsecureTransport",
            "Effect": "Deny",
            "Principal": "*",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::alpha-analytical-platform-mlflow-development",
                "arn:aws:s3:::alpha-analytical-platform-mlflow-development/*"
            ],
            "Condition": {
                "Bool": {
                    "aws:SecureTransport": "false"
                }
            }
        },
        {
            "Sid": "AllowAnalyticalPlatformMLflow",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::381491960855:role/mlflow20240610161705974000000002"
            },
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::alpha-analytical-platform-mlflow-development",
                "arn:aws:s3:::alpha-analytical-platform-mlflow-development/*"
            ]
        }
    ]
}

MLflow is running again, but needs testing

jacobwoffenden · 2024-06-27T20:35:27Z

MLflow deployed to APC, follow on FR raised to create role for mutating permissions #4593

PriyaBasker23 added the feature-request label May 8, 2024

github-project-automation bot added this to Analytical Platform May 8, 2024

github-project-automation bot moved this to 👀 TODO in Analytical Platform May 8, 2024

PriyaBasker23 changed the title ~~✨ - Build ML Flow Tracking Server~~ ✨ Build ML Flow Tracking Server for MLOPS Discovery May 8, 2024

PriyaBasker23 changed the title ~~✨ Build ML Flow Tracking Server for MLOPS Discovery~~ ✨ Build ML Flow Tracking Server for MLOPS Discovery ( Draft - WIP) May 8, 2024

PriyaBasker23 changed the title ~~✨ Build ML Flow Tracking Server for MLOPS Discovery ( Draft - WIP)~~ ✨ Build ML Flow Tracking Server for MLOPS Discovery May 8, 2024

jacobwoffenden moved this from 👀 TODO to 🚀 In Progress in Analytical Platform Jun 10, 2024

jacobwoffenden self-assigned this Jun 10, 2024

jacobwoffenden mentioned this issue Jun 10, 2024

🤖 MLflow on APC ministryofjustice/modernisation-platform-environments#6510

Merged

jacobwoffenden mentioned this issue Jun 11, 2024

✨ Initial image ministryofjustice/analytical-platform-mlflow#3

Merged

jacobwoffenden mentioned this issue Jun 12, 2024

🗺️ Add Helm chart ministryofjustice/analytical-platform-mlflow#4

Merged

jacobwoffenden moved this from 🚀 In Progress to 🚫 Blocked in Analytical Platform Jun 12, 2024

jacobwoffenden changed the title ~~✨ Build ML Flow Tracking Server for MLOPS Discovery~~ ✨ Build MLflow Tracking Server for MLOps Discovery Jun 12, 2024

jacobwoffenden assigned gfowler-moj Jun 17, 2024

jacobwoffenden moved this from 🚫 Blocked to 🚀 In Progress in Analytical Platform Jun 20, 2024

This was referenced Jun 20, 2024

🔧 Update MLFlow artefact bucket ministryofjustice/modernisation-platform-environments#6689

Merged

📌 Update MLflow to 2.14.1 ministryofjustice/analytical-platform-mlflow#11

Merged

jacobwoffenden unassigned gfowler-moj Jun 21, 2024

jacobwoffenden moved this from 🚀 In Progress to 🛂 In Review in Analytical Platform Jun 24, 2024

jacobwoffenden closed this as completed Jun 27, 2024

github-project-automation bot moved this from 🛂 In Review to 🎉 Done in Analytical Platform Jun 27, 2024

jacobwoffenden mentioned this issue Jul 10, 2024

✨ mlflow UI on Analytical Platform #3368

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ Build MLflow Tracking Server for MLOps Discovery #4275

✨ Build MLflow Tracking Server for MLOps Discovery #4275

PriyaBasker23 commented May 8, 2024 •

edited

Loading

mshodge commented May 9, 2024

mshodge commented May 15, 2024

bcrawford-moj commented May 16, 2024 •

edited

Loading

jacobwoffenden commented Jun 10, 2024 •

edited

Loading

jacobwoffenden commented Jun 11, 2024 •

edited

Loading

jacobwoffenden commented Jun 12, 2024 •

edited

Loading

jacobwoffenden commented Jun 12, 2024

jacobwoffenden commented Jun 13, 2024

mshodge commented Jun 13, 2024

jacobwoffenden commented Jun 17, 2024

jacobwoffenden commented Jun 20, 2024

jacobwoffenden commented Jun 20, 2024 •

edited

Loading

jacobwoffenden commented Jun 20, 2024 •

edited

Loading

jacobwoffenden commented Jun 24, 2024

jacobwoffenden commented Jun 27, 2024

✨ Build MLflow Tracking Server for MLOps Discovery #4275

✨ Build MLflow Tracking Server for MLOps Discovery #4275

Comments

PriyaBasker23 commented May 8, 2024 • edited Loading

Describe the feature request.

Details:

Describe the context

Value / Purpose

User Types

mshodge commented May 9, 2024

mshodge commented May 15, 2024

bcrawford-moj commented May 16, 2024 • edited Loading

jacobwoffenden commented Jun 10, 2024 • edited Loading

jacobwoffenden commented Jun 11, 2024 • edited Loading

jacobwoffenden commented Jun 12, 2024 • edited Loading

jacobwoffenden commented Jun 12, 2024

jacobwoffenden commented Jun 13, 2024

mshodge commented Jun 13, 2024

jacobwoffenden commented Jun 17, 2024

jacobwoffenden commented Jun 20, 2024

jacobwoffenden commented Jun 20, 2024 • edited Loading

jacobwoffenden commented Jun 20, 2024 • edited Loading

jacobwoffenden commented Jun 24, 2024

jacobwoffenden commented Jun 27, 2024

PriyaBasker23 commented May 8, 2024 •

edited

Loading

bcrawford-moj commented May 16, 2024 •

edited

Loading

jacobwoffenden commented Jun 10, 2024 •

edited

Loading

jacobwoffenden commented Jun 11, 2024 •

edited

Loading

jacobwoffenden commented Jun 12, 2024 •

edited

Loading

jacobwoffenden commented Jun 20, 2024 •

edited

Loading

jacobwoffenden commented Jun 20, 2024 •

edited

Loading