Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Build MLflow Tracking Server for MLOps Discovery #4275

Closed
PriyaBasker23 opened this issue May 8, 2024 · 15 comments
Closed

✨ Build MLflow Tracking Server for MLOps Discovery #4275

PriyaBasker23 opened this issue May 8, 2024 · 15 comments
Assignees

Comments

@PriyaBasker23
Copy link
Contributor

PriyaBasker23 commented May 8, 2024

Describe the feature request.

Implement a fully managed MLflow tracking server on the AWS platform help in discovery of machine learning operations within MOJ.

Details:

Backend Store: Utilise Amazon RDS to store MLflow metadata and logs securely.
Artifact Storage: Use Amazon S3 for storage of machine learning models and artifacts.
Tracking Server: Deploy an EC2 instance or Docker container to host the MLflow tracking server, enabling remote access

image

Other Requirements

  • Access to the tracking server should be secured through a login system.
  • Only authorized individuals should be able to access the experiments.
  • Data in the artifacts bucket should be organised into specific folders based on user identity. These folders should be accessible only to alpha users who have the necessary permissions. Implementing folder-level permissions for alpha users can be implemented from the control panel . This setup allows users to store artifacts generated from their code execution in Visual Studio in AP.

Details information are available at https://github.com/moj-analytical-services/mlops/blob/main/docs/mlflow/mlflow_tracking_server.md

Describe the context

MLflow Tracking is a component of the MLflow platform that enables data scientists and machine learning engineers to track and log experiments during the model development process. With MLflow Tracking, users can easily record parameters, metrics, and output files from their machine learning experiments, making it easier to organize and compare different approaches. It provides a centralized location to store experiment results, allowing for efficient collaboration and reproducibility. MLflow Tracking also offers a user-friendly interface for visualizing experiment results, enabling users to gain insights into model performance and make informed decisions about model improvements.

Value / Purpose

This configuration will enable data scientists to centralise their experimental data, streamlining access to experiments for all team members. It will facilitates the ability for data scientists to integrate and test MLflow from their existing projects within Visual Studio, using the Application Platform

User Types

Data Scientist

@PriyaBasker23 PriyaBasker23 changed the title ✨ - Build ML Flow Tracking Server ✨ Build ML Flow Tracking Server for MLOPS Discovery May 8, 2024
@PriyaBasker23 PriyaBasker23 changed the title ✨ Build ML Flow Tracking Server for MLOPS Discovery ✨ Build ML Flow Tracking Server for MLOPS Discovery ( Draft - WIP) May 8, 2024
@PriyaBasker23 PriyaBasker23 changed the title ✨ Build ML Flow Tracking Server for MLOPS Discovery ( Draft - WIP) ✨ Build ML Flow Tracking Server for MLOPS Discovery May 8, 2024
@mshodge
Copy link
Contributor

mshodge commented May 9, 2024

Hi team, you might not be able to answer this right away, but for our own MLOps work and planning, it would be really good to know the timescales you might this be deliverable over. Even, what timescales you could start to explore it, whether that's days/weeks/months away. Thank you.

@mshodge
Copy link
Contributor

mshodge commented May 15, 2024

Hi @Ed-Bajo could we set some timescales for this? I'm working with the Probation and Electronic Monitoring team and we'd like it to be available for testing and use soon. Is the end of June a feasible timescale to deliver to? Thanks. Michael

@bcrawford-moj
Copy link

bcrawford-moj commented May 16, 2024

This feature would be extremely useful for the BOLD AI for Linked Data team. We currently have no good way to track ML experiments and this would be a great step towards industry best practice. We'd like to see it as soon as is possible as we are a time limited programme.

@jacobwoffenden
Copy link
Member

jacobwoffenden commented Jun 10, 2024

10/06/24 summary:

@jacobwoffenden
Copy link
Member

jacobwoffenden commented Jun 11, 2024

11/06/24 summary:

@jacobwoffenden
Copy link
Member

jacobwoffenden commented Jun 12, 2024

12/06/24 summary:

@jacobwoffenden
Copy link
Member

Moving to blocked while discussed way forward with Analytical Platform Product Management

@jacobwoffenden jacobwoffenden moved this from 🚀 In Progress to 🚫 Blocked in Analytical Platform Jun 12, 2024
@jacobwoffenden jacobwoffenden changed the title ✨ Build ML Flow Tracking Server for MLOPS Discovery ✨ Build MLflow Tracking Server for MLOps Discovery Jun 12, 2024
@jacobwoffenden
Copy link
Member

Notes:

  • Alpha users would need access to the S3 bucket to retrieve models, this in theory is OK, but we'd need to mutate an Alpha users permissions based on what experiments and models they are allowed to access

@mshodge
Copy link
Contributor

mshodge commented Jun 13, 2024

Solution one: users set their own artifact location when creating experiment

One solution is that users can define their own artifact location in MLFlow at the create experiment level (https://mlflow.org/docs/latest/rest-api.html#create-experiment) meaning they can direct artifacts to be stored at their own buckets anyway - but not sure how this works with access between MLFlow and that bucket? I will test this with the running server and see what error it gives.

Solution two: wrapper and AP control panel can be used to create experiments and assign S3 perms

There seems to be some circularity brewing with the process in that:

  1. User gets access to MLFlow and has user permissions added
  2. User creates an experiment and run using code and UI which pushed model artifacts to S3 bucket folder
  3. User needs permissions to use the model artifacts from S3 bucket outside of MLFlow

If in someway 1 can be done using their alpha user name somehow then we need a way of making sure if they make a new experiment, this then is linked back to their alpha user name for the S3 perms.

A solution might be to force users to use the AP Control Panel for creating experiments through the MLFlow API (https://mlflow.org/docs/latest/rest-api.html#create-experiment) instead of them creating them through code or the UI (although not sure how we really can prevent this :/) as then the api wrapper can also do the S3 perms at the same time at the artifact level.

@jacobwoffenden
Copy link
Member

@gfowler-moj is going to put a session in to review the way forward around authentication/permissions management

@jacobwoffenden
Copy link
Member

Outcome of meeting:

  • Switch to using Alpha bucket for artefacts
  • Make Priya and Michael admins

@jacobwoffenden
Copy link
Member

jacobwoffenden commented Jun 20, 2024

I have created a group in Control Panel (analytical-platform-mlflow-admins), added @mshodge and @PriyaBasker23, and create 3 artefact buckets:

  • alpha-analytical-platform-mlflow
  • alpha-analytical-platform-mlflow-development
  • alpha-analytical-platform-mlflow-test

@jacobwoffenden
Copy link
Member

jacobwoffenden commented Jun 20, 2024

TODO:

  • Update alpha bucket policies to allow APC roles
    • development
    • test
    • production
  • Update MLFlow role to access alpha buckets
  • Update s3_bucket_name in MLFlow values
  • Drop schemas to reset MLFlow

@jacobwoffenden
Copy link
Member

alpha-analytical-platform-mlflow-development updated with below JSON

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "DenyInsecureTransport",
            "Effect": "Deny",
            "Principal": "*",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::alpha-analytical-platform-mlflow-development",
                "arn:aws:s3:::alpha-analytical-platform-mlflow-development/*"
            ],
            "Condition": {
                "Bool": {
                    "aws:SecureTransport": "false"
                }
            }
        },
        {
            "Sid": "AllowAnalyticalPlatformMLflow",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::381491960855:role/mlflow20240610161705974000000002"
            },
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::alpha-analytical-platform-mlflow-development",
                "arn:aws:s3:::alpha-analytical-platform-mlflow-development/*"
            ]
        }
    ]
}

MLflow is running again, but needs testing

@jacobwoffenden jacobwoffenden moved this from 🚀 In Progress to 🛂 In Review in Analytical Platform Jun 24, 2024
@jacobwoffenden
Copy link
Member

MLflow deployed to APC, follow on FR raised to create role for mutating permissions #4593

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

5 participants