Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

πŸ§‘β€βœˆοΈ Run AWS SageMaker pilot for our users #1262

Closed
3 of 5 tasks
Tracked by #1915 ...
bagg3rs opened this issue Aug 23, 2023 · 5 comments
Closed
3 of 5 tasks
Tracked by #1915 ...

πŸ§‘β€βœˆοΈ Run AWS SageMaker pilot for our users #1262

bagg3rs opened this issue Aug 23, 2023 · 5 comments

Comments

@bagg3rs
Copy link
Contributor

bagg3rs commented Aug 23, 2023

User Story

As a Data Scientist
I want to explore using LLM Large Language Models and GPU backed instances
So that there is a secure managed location to explore using these tool which can aid MoJ's data driven strategy

Context

Our user want analytical features not available on our existing platform. The types of tools and underlying compute is changing rapidly. SageMaker provides a managed service for these tools and provides instances with higher resources and GPU to speed and aid research.

There is also a lot of interest in using LLMs for e.g.semantic search of free text. SageMaker in VPC isolation mode makes sure sensitive workloads are secured and stay within the instance, we can further secure data using private VPC and PrivateLink

Value and consequences

  • SageMaker costs are based on usage and can vary significantly month-to-month depending on your application's usage, instance type. We can provide proactive notifications that will for the first time allow our users to understand the cost of the work that they are doing
  • Improved Security, workloads will run in isolation mode (without access to internet) to further secure sensitive data and reduce fear of data exposure for Data Owners
  • Reduced operational cost and complexity
  • Agility and change readiness, additional analytical services can be offered when available without considerable effort that has lead to users having to work elsewhere (extensive development of front and backend services) e.g. Control Panel
  • Better cost transparency, we will understand our tooling compute costs which is currently very difficult to calculate
  • For Foundation Models, SageMaker JumpStart does not download models from a public model zoo, it can be used in fully locked-down e.g. no internet access
  • Network access can be limited and scoped down for SageMaker JumpStart models, this helps teams improve the security posture of the environment
  • Due to the VPC boundaries, access to the endpoint can also be limited via subnets and security groups, which adds an extra layer of security
  • Leverage managed services like SageMaker Studio
  • If successful we can close down our current tooling EKS. EKS although managed still requires a considerable amount of effort to run with the endless upgrades and get on with more useful tasks

Disadvantages

  • RStudio on Amazon SageMaker is a paid product and requires that each user is appropriately licensed. As part of the pilot we will need to understand our users need for RStudio

Hypothesis

If we provide tooling to our data scientists
Then they can tell better stories about our data

Proposal

πŸ‘₯ AWS onsite session with DP and Data Scientist βœ…
πŸš— Deploy SageMaker with Terraform in compute account
πŸ‘€ Create any required roles?
πŸŽ›οΈ Add to Control Panel
πŸš€ Release to users
πŸ—£οΈ Gather user feedback

Definition of done

  • Send over to AWS current number of RStudio and JupyterLabs users.
  • Arrange dates with AWS and get out invites (Mon 27th Nov)
  • If successful complete spike 🌱 Spike: SageMakerΒ #1922
  • Cost analysis explored aws cost here vs EKS GPU instances AWS mention 54% lower TCO over 3yrs.
  • Feedback from users captured from pilot

Reference

POC implementation was implemented Control Panel see closed PR

How to write good user stories

@bagg3rs bagg3rs added the data-platform-apps-and-tools This issue is owned by Data Platform Apps and Tools label Aug 23, 2023
@bagg3rs bagg3rs changed the title πŸ“Œ Investigate adding AWS SageMaker to tooling πŸ“Œ Investigate adding AWS SageMaker to tooling PoC Sep 20, 2023
@bagg3rs bagg3rs mentioned this issue Sep 21, 2023
3 tasks
@YvanMOJdigital YvanMOJdigital added the user-centred-design Research or design activity needed label Sep 25, 2023
@alex-vonfeldmann
Copy link

I'm not sure this requires UCD at this point, the Data Science guys were pretty clear what they want (which is better infrastructure that supports LLM). @bagg3rs Am I wrong?

Does this fall under AP BAU or DP?

@bagg3rs
Copy link
Contributor Author

bagg3rs commented Sep 27, 2023

@alex-vonfeldmann correct no UCD is required at this time (removed label), we will gather feedback once we start PoC.
This falls DP? as it looks at potentially replacing all current AP managed tooling (Jupyter/RStudio).

@bagg3rs bagg3rs removed the user-centred-design Research or design activity needed label Sep 27, 2023
@bagg3rs bagg3rs changed the title πŸ“Œ Investigate adding AWS SageMaker to tooling PoC πŸ“Œ Investigate adding AWS SageMaker to tooling and attend AWS session with our users Oct 17, 2023
@Gary-H9 Gary-H9 changed the title πŸ“Œ Investigate adding AWS SageMaker to tooling and attend AWS session with our users πŸ“Œ Spike: Investigate adding AWS SageMaker to tooling and attend AWS session with our users Nov 14, 2023
@bagg3rs
Copy link
Contributor Author

bagg3rs commented Jan 31, 2024

referencing this feature request as it could be a good way to host vscode / coder

@bagg3rs bagg3rs changed the title πŸ“Œ Spike: Investigate adding AWS SageMaker to tooling and attend AWS session with our users πŸ“Œ AWS SageMaker pilot for our users Feb 14, 2024
@bagg3rs bagg3rs changed the title πŸ“Œ AWS SageMaker pilot for our users πŸ§‘β€βœˆοΈ AWS SageMaker pilot for our users Feb 14, 2024
@bagg3rs bagg3rs changed the title πŸ§‘β€βœˆοΈ AWS SageMaker pilot for our users πŸ§‘β€βœˆοΈ Run AWS SageMaker pilot for our users Feb 14, 2024
@jacobwoffenden jacobwoffenden moved this to πŸ‘€ TODO in Analytical Platform Feb 15, 2024
Copy link
Contributor

This issue is being marked as stale because it has been open for 60 days with no activity. Remove stale label or comment to keep the issue open.

@github-actions github-actions bot added the stale label Apr 26, 2024
Copy link
Contributor

github-actions bot commented May 3, 2024

This issue is being closed because it has been open for a further 7 days with no activity. If this is still a valid issue, please reopen it, Thank you!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 3, 2024
@github-project-automation github-project-automation bot moved this from πŸ‘€ TODO to πŸŽ‰ Done in Analytical Platform May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

3 participants