Skip to content

Commit

Permalink
feat: Add pattern that demonstrates using ML capacity block reservati…
Browse files Browse the repository at this point in the history
…on with self-managed node group (#1941)
  • Loading branch information
bryantbiggs authored May 8, 2024
1 parent f9fca1d commit 447cb5f
Show file tree
Hide file tree
Showing 11 changed files with 288 additions and 13 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/publish-docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install mkdocs-material==9.3.1 \
mkdocs-include-markdown-plugin==6.0.1 \
python -m pip install mkdocs-material==9.5.21 \
mkdocs-include-markdown-plugin==6.0.6 \
mkdocs-awesome-pages-plugin==2.9.2
- name: git config
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
repos:
- repo: https://github.com/streetsidesoftware/cspell-cli
rev: v8.7.0
rev: v8.8.0
hooks:
- id: cspell
args: [--exclude, 'ADOPTERS.md', --exclude, '.pre-commit-config.yaml', --exclude, '.gitignore', --exclude, '*.drawio', --exclude, 'mkdocs.yml', --exclude, '.helmignore', --exclude, '.github/workflows/*', --exclude, 'patterns/istio-multi-cluster/*', --exclude, 'patterns/blue-green-upgrade/*']
Expand Down
7 changes: 7 additions & 0 deletions docs/patterns/ml-capacity-block.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
title: ML Capacity Block Reservation (CBR)
---

{%
include-markdown "../../patterns/ml-capacity-block/README.md"
%}
31 changes: 31 additions & 0 deletions patterns/ml-capacity-block/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# EKS w/ ML Capacity Block Reservation (CBR)

This pattern demonstrates how to consume/utilize ML capacity block reservations (CBR) with Amazon EKS. The solution is comprised of primarily 2 components:

!!! warning
The use of self-managed node group(s) are required at this time to support capacity block reservations within EKS. This pattern will be updated to demonstrate EKS managed node groups once support has been implemented by the EKS service.

1. The self-managed node group that will utilize the CBR should have the subnets provided to it restricted to the availability zone where the CBR has been allocated. For example - if the CBR is allocated to `us-west-2b`, the node group should only have subnet IDs provided to it that reside in `us-west-2b`. If the subnets that reside in other AZs are provided, its possible to encounter an error such as `InvalidParameterException: The following supplied instance types do not exist ...`. It is not guaranteed that this error will always be shown, and may appear random since the underlying autoscaling group(s) will provision nodes into different AZs at random. It will only occur when the underlying autoscaling group tries to provision instances into an AZ where capacity is not allocated and there is insufficient on-demand capacity for the desired instance type.

2. The launch template utilized should specify the `instance_market_options` and `capacity_reservation_specification` arguments. This is how the CBR is utilized by the node group (i.e. - tells the autoscaling group to launch instances utilizing provided capacity reservation).

<b>Links:</b>

- [EKS - Capacity Blocks for ML](https://docs.aws.amazon.com/eks/latest/userguide/capacity-blocks.html)
- [EC2 - Capacity Blocks for ML](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-blocks.html)

## Code

```terraform hl_lines="53-93"
{% include "../../patterns/ml-capacity-block/eks.tf" %}
```

## Deploy

See [here](https://aws-ia.github.io/terraform-aws-eks-blueprints/getting-started/#prerequisites) for the prerequisites and steps to deploy this pattern.

## Destroy

{%
include-markdown "../../docs/_partials/destroy.md"
%}
144 changes: 144 additions & 0 deletions patterns/ml-capacity-block/eks.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
################################################################################
# Required Input
################################################################################

# See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/capacity-blocks-using.html
# on how to obtain a ML capacity block reservation. Once acquired, you can provide
# the reservation ID through this input to deploy the pattern
variable "capacity_reservation_id" {
description = "The ID of the ML capacity block reservation to use for the node group"
type = string
}

################################################################################
# Cluster
################################################################################

module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 20.9"

cluster_name = local.name
cluster_version = "1.29"

# Give the Terraform identity admin access to the cluster
# which will allow it to deploy resources into the cluster
enable_cluster_creator_admin_permissions = true
cluster_endpoint_public_access = true

cluster_addons = {
coredns = {}
kube-proxy = {}
vpc-cni = {}
}

# Add security group rules on the node group security group to
# allow EFA traffic
enable_efa_support = true

vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets

eks_managed_node_groups = {
# This node group is for core addons such as CoreDNS
default = {
instance_types = ["m5.large"]

min_size = 1
max_size = 2
desired_size = 2
}
}

# Note: ML capacity block reservations are only supported
# on self-managed node groups at this time
self_managed_node_groups = {
odcr = {
# The EKS AL2 GPU AMI provides all of the necessary components
# for accelerated workloads w/ EFA
ami_type = "AL2_x86_64_GPU"
instance_type = "p5.48xlarge"

pre_bootstrap_user_data = <<-EOT
# Mount instance store volumes in RAID-0 for kubelet and containerd
# https://github.com/awslabs/amazon-eks-ami/blob/master/doc/USER_GUIDE.md#raid-0-for-kubelet-and-containerd-raid0
/bin/setup-local-disks raid0
# Ensure only GPU workloads are scheduled on this node group
export KUBELET_EXTRA_ARGS='--node-labels=vpc.amazonaws.com/efa.present=true,nvidia.com/gpu.present=true \
--register-with-taints=nvidia.com/gpu=true:NoSchedule'
EOT

min_size = 2
max_size = 2
desired_size = 2

# This will:
# 1. Create a placement group to place the instances close to one another
# 2. Ignore subnets that reside in AZs that do not support the instance type
# 3. Expose all of the available EFA interfaces on the launch template
enable_efa_support = true

# ML capacity block reservation
instance_market_options = {
market_type = "capacity-block"
}
capacity_reservation_specification = {
capacity_reservation_target = {
capacity_reservation_id = var.capacity_reservation_id
}
}
}
}

tags = local.tags
}

################################################################################
# Helm charts
################################################################################

resource "helm_release" "nvidia_device_plugin" {
name = "nvidia-device-plugin"
repository = "https://nvidia.github.io/k8s-device-plugin"
chart = "nvidia-device-plugin"
version = "0.14.5"
namespace = "nvidia-device-plugin"
create_namespace = true
wait = false

values = [
<<-EOT
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: 'nvidia.com/gpu.present'
operator: In
values:
- 'true'
EOT
]
}

resource "helm_release" "aws_efa_device_plugin" {
name = "aws-efa-k8s-device-plugin"
repository = "https://aws.github.io/eks-charts"
chart = "aws-efa-k8s-device-plugin"
version = "v0.4.4"
namespace = "kube-system"
wait = false

values = [
<<-EOT
nodeSelector:
vpc.amazonaws.com/efa.present: 'true'
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
EOT
]
}
87 changes: 87 additions & 0 deletions patterns/ml-capacity-block/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
terraform {
required_version = ">= 1.3"

required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 5.34"
}
helm = {
source = "hashicorp/helm"
version = ">= 2.9"
}
}

# ## Used for end-to-end testing on project; update to suit your needs
# backend "s3" {
# bucket = "terraform-ssp-github-actions-state"
# region = "us-west-2"
# key = "e2e/ml-capacity-block/terraform.tfstate"
# }
}

provider "aws" {
region = local.region
}

provider "helm" {
kubernetes {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
# This requires the awscli to be installed locally where Terraform is executed
args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
}
}
}

################################################################################
# Common data/locals
################################################################################

data "aws_availability_zones" "available" {}

locals {
name = basename(path.cwd)
region = "us-west-2"

vpc_cidr = "10.0.0.0/16"
azs = slice(data.aws_availability_zones.available.names, 0, 3)

tags = {
Blueprint = local.name
GithubRepo = "github.com/aws-ia/terraform-aws-eks-blueprints"
}
}

################################################################################
# Supporting Resources
################################################################################

module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"

name = local.name
cidr = local.vpc_cidr

azs = local.azs
private_subnets = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 4, k)]
public_subnets = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 8, k + 48)]

enable_nat_gateway = true
single_nat_gateway = true

public_subnet_tags = {
"kubernetes.io/role/elb" = 1
}

private_subnet_tags = {
"kubernetes.io/role/internal-elb" = 1
}

tags = local.tags
}
12 changes: 6 additions & 6 deletions patterns/nvidia-gpu-efa/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ The following components are demonstrated in this pattern:

- A "default" node group that supports addons and components that do not require GPUs nor EFA devices. Any pods that do not tolerate the taints of the GPU node group will be scheduled on instances within this node group.
- A node group of `p5.48xlarge` instances with
- all x32 [EFA network interfaces](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html) enabled
- provisioned within a placement group so that the instances are provisioned close to one another in a single availability zone that supports the instance type.
- a common NVIDIA taint of `"nvidia.com/gpu:NoSchedule"` to ensure only the intended applications are allowed to run on the nodes created
- two labels to identify that this nodegroup supports NVIDIA GPUs and EFA devices and allow pods to use node selectors with these labels
- the NVME instance store volumes are mounted in a RAID-0 array to provide a single, large, high-performance storage volume for the GPU workloads
- kubelet and containerd are configured to utilize the RAID-0 volume, allowing kubelet to discover the additional storage as ephemeral storage that can be utilized by pods
- all x32 [EFA network interfaces](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html) enabled
- provisioned within a placement group so that the instances are provisioned close to one another in a single availability zone that supports the instance type.
- a common NVIDIA taint of `"nvidia.com/gpu:NoSchedule"` to ensure only the intended applications are allowed to run on the nodes created
- two labels to identify that this nodegroup supports NVIDIA GPUs and EFA devices and allow pods to use node selectors with these labels
- the NVME instance store volumes are mounted in a RAID-0 array to provide a single, large, high-performance storage volume for the GPU workloads
- kubelet and containerd are configured to utilize the RAID-0 volume, allowing kubelet to discover the additional storage as ephemeral storage that can be utilized by pods
- A Helm chart deployment for the [NVIDIA device plugin](https://github.com/NVIDIA/k8s-device-plugin) to expose and mount the GPUs provided by the instances to the pods that request them
- A Helm chart deployment for the EFA device plugin to expose and mount the EFA network interfaces provided by the instances to the pods that request them. Since the EFA network interfaces are only found on the instances that provide NVIDIA GPUs in this pattern, we do not apply an additional taint for the EFA network interfaces to avoid over-constraining.

Expand Down
2 changes: 1 addition & 1 deletion patterns/nvidia-gpu-efa/eks.tf
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 20.8"
version = "~> 20.9"

cluster_name = local.name
cluster_version = "1.29"
Expand Down
2 changes: 1 addition & 1 deletion patterns/nvidia-gpu-efa/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ terraform {
# backend "s3" {
# bucket = "terraform-ssp-github-actions-state"
# region = "us-west-2"
# key = "e2e/nvida-gpu-efa/terraform.tfstate"
# key = "e2e/nvidia-gpu-efa/terraform.tfstate"
# }
}

Expand Down
8 changes: 7 additions & 1 deletion patterns/targeted-odcr/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,17 @@ This pattern demonstrates how to consume/utilize on-demand capacity reservations

3. A resource group will need to be created for the capacity reservations. The resource group acts like a container, allowing for ODCRs to be added or removed as needed to adjust the available capacity. Utilizing the resource group allows for this additional capacity to be adjusted without any modification or disruption to the existing node group or launch template. As soon as the ODCR has been associated to the resource group, the node group can scale up to start utilizing that capacity.

Links:
<b>Links:</b>

- [Tutorial: Launch On-Demand Instances using targeted Capacity Reservations](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-fleet-launch-on-demand-instances-using-targeted-capacity-reservations-walkthrough.html)
- [Target a group of Amazon EC2 On-Demand Capacity Reservations](https://aws.amazon.com/blogs/mt/target-a-group-of-amazon-ec2-on-demand-capacity-reservations/)

## Code

```terraform hl_lines="34-51"
{% include "../../patterns/targeted-odcr/eks.tf" %}
```

## Deploy

See [here](https://aws-ia.github.io/terraform-aws-eks-blueprints/getting-started/#prerequisites) for the prerequisites and steps to deploy this pattern.
Expand Down
2 changes: 1 addition & 1 deletion patterns/targeted-odcr/eks.tf
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 20.8"
version = "~> 20.9"

cluster_name = local.name
cluster_version = "1.29"
Expand Down

0 comments on commit 447cb5f

Please sign in to comment.