Skip to content

Commit

Permalink
Merge pull request #138 from ruecarlo/ecs-final-changes
Browse files Browse the repository at this point in the history
ECS Capacity Providers workshop
  • Loading branch information
ruecarlo authored Feb 22, 2021
2 parents a94eea2 + 9cb754d commit 6682f3a
Show file tree
Hide file tree
Showing 109 changed files with 2,806 additions and 942 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ resources/
node_modules/
**/cdk.out/*
*~
.idea/*
1 change: 1 addition & 0 deletions config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ disableAssetsBusting = false
disableLanguageSwitchingButton = false
disableShortcutsTitle = false
disableInlineCopyToClipBoard = true
disableLandingPageButton = true

[outputs]
home = [ "HTML", "AMP", "RSS", "JSON"]
Expand Down
13 changes: 0 additions & 13 deletions content/authors.md

This file was deleted.

12 changes: 12 additions & 0 deletions content/ecs-spot-capacity-providers/Introduction/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
+++
title = "Introduction"
weight = 20
+++


If you are already familiar with the concepts below or already have experience with operating ECS clusters, you can skip the introduction and proceed to [**Setup the workspace environment on AWS**](/ecs-spot-capacity-providers/workshopsetup.html) section to start the workshop.

Otherwise, you can read through to get an initial understanding of the services, technologies and features used in this workshop.


{{% children %}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
+++
title = "Introduction to Containers"
weight = 10
+++

![Container Ship](/images/ecs-spot-capacity-providers/containership.jpg)

What is a Container?
---

* Containers provide a standard way to package your application’s code, configurations, and dependencies into a single object.
* Containers share an operating system installed on the server and run as a resource-isolated processes, ensuring quick, reliable, and consistent deployments, regardless of environment.
* Whether you deploy locally on your laptop or to production, the experience will remain the same (except secrets and other environmental values, of course).

Why Containers?
---
Containers allow developers to iterate at high velocity and offer the speed to scale to meet the demands of the application. It is first important to understand what a container is, and how it enables teams to move faster.

Benefits of Containers
---

Containers are a powerful way for developers to package and deploy their applications. They are lightweight and provide a consistent, portable software environment for applications to easily run and scale anywhere. Building and deploying microservices, running batch jobs, for machine learning applications, and moving existing applications into the cloud is just some popular use cases for containers.

Amazon EC2 Spot Instances
---

[Amazon EC2 Spot Instances] (https://aws.amazon.com/ec2/spot/) offer spare compute capacity available in the AWS Cloud at steep discounts compared to On-Demand prices. EC2 can interrupt Spot Instances with two minutes of notification when EC2 needs the capacity back. You can use Spot Instances for various fault-tolerant and flexible applications. Some examples are analytics, containerized workloads, high-performance computing (HPC), stateless web servers, rendering, CI/CD, and other test and development workloads.
92 changes: 92 additions & 0 deletions content/ecs-spot-capacity-providers/Introduction/intro_to_ecs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
+++
title = "Introduction to ECS"
weight = 20
+++

![Amazon ECS](/images/ecs-spot-capacity-providers/ecs.png)

- [Amazon Elastic Container Service (Amazon ECS)](https://aws.amazon.com/ecs/) is a highly scalable, high-performance container orchestration service that supports Docker containers and allows you to easily run and scale containerized applications on AWS.

- Amazon ECS eliminates the need for you to install and operate your own container orchestration software, manage and scale a cluster of virtual machines, or schedule containers on those virtual machines.

- ECS is also deeply integrated into the rest of the AWS ecosystem.

![ECS integration](/images/ecs-spot-capacity-providers/integration.svg)

## Amazon ECS Clusters

An Amazon ECS cluster is a logical grouping of tasks or services, which we'll cover in more detail in the following pages.

- If you are running tasks or services that use the EC2 launch type, a cluster is also a grouping of container instances.
- If you are using capacity providers, a cluster is also a logical grouping of capacity providers.
- A cluster can be a combination of Fargate and EC2 launch types.

When you first use Amazon ECS, a default cluster is created for you, but you can create multiple clusters in an account to keep your resources separate.

For more information on ECS Clusters, see [here](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/clusters.html).

## Tasks Definitions


To prepare your application to run on Amazon ECS, you create a task definition. The task definition is a text file, in JSON format, that describes one or more containers, up to a maximum of ten, that form your application.

We can think of it as a blueprint for your application. Task definitions specify various parameters for your application. Examples of task definition parameters are which containers to use, which launch type to use, which ports to open for your application, and what data volumes to use with the containers in the task. The specific parameters available for the task definition depend on which launch type you are using. For more information about creating task definitions, see [Amazon ECS Task Definitions](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definitions.html).

The following is an example of a task definition containing a single container that runs an NGINX web server using the Fargate launch type. For a more extended example showing the use of multiple containers in a task definition, see [Example Task Definitions](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/example_task_definitions.html).

```
{
“family”: “webserver”,
“containerDefinitions”: [
{
“name”: “web”,
“image”: “nginx”,
“memory”: “100”,
“cpu”: “99”
},
],
“requiresCompatibilities”: [
“FARGATE”
],
“networkMode”: “awsvpc”,
“memory”: “512”,
“cpu”: “256”,
}
```

## Fargate

[AWS Fargate](https://aws.amazon.com/fargate/) is a technology for Amazon ECS that allows you to run containers without having to manage servers or clusters. With AWS Fargate, you no longer have to provision, configure, and scale clusters of virtual machines to run containers. This removes the need to choose server types, decide when to scale your clusters, or optimize cluster packing. AWS Fargate removes the need for you to interact with or think about servers or clusters. Fargate lets you focus on designing and building your applications instead of managing the infrastructure that runs them.

## Tasks and Scheduling

A task is the instantiation of a task definition within a cluster. After you have created a task definition for your application within Amazon ECS, you can specify the number of tasks that will run on your cluster. Each task that uses the Fargate launch type has its own isolation boundary and does not share the underlying kernel, CPU resources, memory resources, or elastic network interface with another task.

The Amazon ECS task scheduler places tasks within your cluster. There are several scheduling options available. For example, you can define a service that runs and maintains a specified number of tasks simultaneously. You might also want to run a single task on a schedule or invoke it through APIs or as part of a serverless workflow. For more information about the different scheduling options available, see [Scheduling Amazon ECS Tasks](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/scheduling_tasks.html).

## Services

Amazon ECS allows you to run and maintain a specified number of instances of a task definition simultaneously in an Amazon ECS cluster. This is called a service. If any of your tasks should fail or stop for any reason, the Amazon ECS service scheduler launches another instance of your task definition to replace it and maintain the desired count of tasks in the service depending on the scheduling strategy used.

Besides maintaining the desired count of tasks in your service, you can optionally run your service behind a load balancer. The load balancer distributes traffic across the tasks associated with the service.

There are two service scheduler strategies available:

- REPLICA:

- The replica scheduling strategy places and maintains the desired number of tasks across your cluster. By default, the service scheduler spreads tasks across Availability Zones. You can use task placement strategies and constraints to customize task placement decisions. For more information, see [Replica](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_services.html#service_scheduler_replica).

- DAEMON:

- The daemon scheduling strategy deploys exactly one task on each active container instance that meets all the task placement constraints that you specify in your cluster. The service scheduler evaluates the task placement constraints for running tasks and will stop tasks that do not meet the placement constraints. When using this strategy, there is no need to specify a desired number of tasks, a task placement strategy, or use Service Auto Scaling policies. For more information, see [Daemon](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_services.html#service_scheduler_daemon).


## Service Discovery

Because containers are immutable by nature, they can churn regularly and be replaced with newer versions of the service. This means that there is a need to register the new and deregister the old/unhealthy services. To do this on your own is challenging, hence the need for service discovery.

AWS Cloud Map is a cloud resource discovery service. With Cloud Map, you can define custom names for your application resources, and it maintains the updated location of these dynamically changing resources. This increases your application availability because your web service always discovers the most up-to-date locations of its resources.

Cloud Map natively integrates with ECS, and as we build services in the workshop, will see this firsthand. For more information on service discovery with ECS, please see [here](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-discovery.html).

![Service Discovery](/images/ecs-spot-capacity-providers/cloudmapproduct.png)
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
+++
title = "Scaling ECS Workloads"
weight = 30
+++

There are different approaches for scaling a system. Traditionally systems have used, what we call an **Infrastructure First** approach, where the system focuses on infrastructure metrics such as CPU or Memory usage, and scales up the cluster infrastructure. In this case the application scales up following the metrics derived from the infrastructure.

While you can still use that approach on ECS, ECS follows an **Application First** scaling approach, where the scaling is based on the number of desired. ECS has two type of scaling activities:

* **ECS Service / Application Scaling**: This refers to the ability to increase or decrease the desired count of tasks in your Amazon ECS service based on dynamic traffic and load patterns in the workload. Amazon ECS publishes CloudWatch metrics with your service’s average CPU and memory usage. You can use these and other CloudWatch metrics to scale out your service (add more tasks) to deal with high demand at peak times, and to scale in your service (run fewer tasks) to reduce costs during periods of low utilization.

* **ECS Container Instances Scaling**: This refers to the ability to increase or decrease the desired count of EC2 instances in your Amazon ECS cluster based on ECS Service / Application scaling. For this kind of scaling, it is typical practice depending upon Auto Scaling group level scaling policies.


To scale the infrastructure using the **Application First** approach on ECS, we will use Amazon ECS cluster **Capacity Providers** to determine the infrastructure in use for our tasks and we will use Amazon ECS **Cluster Auto Scaling** (CAS) to enables to manage the scale of the cluster according to the application needs.

Capacity Providers configuration include:

* An **Auto Scaling Group** to associate with the capacity provider. The Autoscaling group must already exist.
* An attribute to enable/disable **Managed scaling**; if enabled, Amazon ECS manages the scale-in and scale-out actions of the Auto Scaling group through the use of AWS Auto Scaling scaling plan also referred to as **Cluster Auto Scaling** (CAS). This also means you can scale up your ECS cluster zero capacity in the Auto Scaling group.
* An attribute to define the **Target capacity %(percentage)** - number between 1 and 100. When **managed scaling** is enabled this value is used as the target value against the metric used by Amazon ECS-managed target tracking scaling policy.
* An attribute to define **Managed termination protection**. which prevents EC2 instances that contain ECS tasks and that are in an Auto Scaling group from being terminated during scale-in actions.


Each ECS cluster can have one or more capacity providers and an optional default capacity provider strategy. For an ECS Cluster there is a **Default capacity provider strategy** that can be set for Newly created tasks or services on the cluster that are created without an explicit strategy. Otherwise, for those services or tasks where the default capacity provider strategy does not meet your needs you can define a **capacity provider strategy** that is specific for that service or task.

{{% notice info %}}
You can read more about **Capacity Provider Strategies** [here](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/cluster-capacity-providers.html)
{{% /notice %}}

# ECS Cluster Auto scaling

When enabling **managed scaling** Amazon ECS manages the scale-in and scale-out actions of the Auto Scaling group. This is what we call ECS **Cluster Auto Scaling (CAS)**. CAS is a new capability for ECS to manage the scaling of EC2 Auto Scaling groups (ASG). CAS relies on ECS capacity providers.

Amazon ECS creates an AWS Auto Scaling scaling plan with a target tracking scaling policy based on the target capacity value you specify. Amazon ECS then associates this scaling plan with your Auto Scaling group. For each of the capacity providers with managed scaling enabled, an Amazon ECS managed CloudWatch metric with the prefix `AWS/ECS/ManagedScaling` is created along with two CloudWatch alarms. The CloudWatch metrics and alarms used to monitor the container instance capacity in your Auto Scaling groups and will trigger the Auto Scaling group to scale in and scale out as needed.

The scaling policy uses a new CloudWatch metric called **CapacityProviderReservation** that ECS publishes for every ASG capacity provider that has managed scaling enabled. The new CloudWatch metric CapacityProviderReservation is defined as follows.

```ruby
CapacityProviderReservation = ( M / N ) x 100
```

Where:

* **N** represents the current number of instances in the Auto Scaling group(ASG) that are **already running**
* **M** represents the number of instances running in an ASG necessary to meet the needs of the tasks assigned to that ASG, including tasks already running and tasks the customer is trying to run that don’t fit on the existing instances.

Given this assumption, if N = M, scaling out not required, and scaling in isn’t possible. If N < M, scale out is required because you don’t have enough instances. If N > M, scale in is possible (but not necessarily required) because you have more instances than you need to run all of your ECS tasks.The CapacityProviderReservation metric is a relative proportion of Target capacity value and dictates how much scale-out / scale-in should happen. CAS always tries to ensure **CapacityProviderReservation** is equal to specified Target capacity value either by increasing or decreasing number of instances in ASG.

The scale-out activity is triggered if **`CapacityProviderReservation` > `Target capacity`** for 1 datapoints with 1 minute duration. That means it takes 1 minute to scale out the capacity in the ASG. The scale-in activity is triggered if CapacityProviderReservation < Target capacity for 15 data points with 1 minute duration. We will see all of this demonstrated in this workshop.

{{% notice info %}}
You can read more about **ECS Cluster Auto Scaling (CAS)** and how it works under different scenarios and conditions **[in this blog post](https://aws.amazon.com/blogs/containers/deep-dive-on-amazon-ecs-cluster-auto-scaling/)**
{{% /notice %}}
8 changes: 8 additions & 0 deletions content/ecs-spot-capacity-providers/WorkshopSetup/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: "Setup the Workspace environment"
weight: 40
---


{{% children %}}

96 changes: 96 additions & 0 deletions content/ecs-spot-capacity-providers/WorkshopSetup/cli_setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
---
title: "Setup AWS CLI and clone the workshop repo"
weight: 40
---

{{% notice tip %}}
For this workshop, please ignore warnings about the version of pip being used.
{{% /notice %}}

1. Run the following command to view the current version of aws-cli:
```
aws --version
```

1. Update to the latest version:
```
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
. ~/.bash_profile
```

1. Confirm you have a newer version:
```
aws --version
```

Install dependencies for use in the workshop by running:

```
sudo yum -y install jq gettext
```

### Clone the GitHub repo

In order to execute the steps in the workshop, you'll need to clone the workshop GitHub repo.

In the Cloud9 IDE terminal, run the following command:

```
git clone https://github.com/awslabs/ec2-spot-workshops.git
```
Change into the workshop directory:

```
cd ec2-spot-workshops/workshops/ecs-spot-capacity-providers
```

Feel free to browse around. You can also browse the directory structure in the **Environment** tab on the left and even edit files directly there by double clicking on them.

We should configure our aws cli with our current region as default:

```
export ACCOUNT_ID=$(aws sts get-caller-identity --output text --query Account)
export AWS_REGION=$(curl -s 169.254.169.254/latest/dynamic/instance-identity/document | jq -r '.region')
echo "export ACCOUNT_ID=${ACCOUNT_ID}" >> ~/.bash_profile
echo "export AWS_REGION=${AWS_REGION}" >> ~/.bash_profile
aws configure set default.region ${AWS_REGION}
aws configure get default.region
```

Use the commands below to set the CloudFormation stack name to an environment variable.

* If you created the stack manually:

```
export STACK_NAME=EcsSpotWorkshop
```

* If the stack created automatically within Event Engine:

```
export STACK_NAME=$(aws cloudformation list-stacks | jq -r '.StackSummaries[] | select(.StackName|test("mod.")) | .StackName')
echo "STACK_NAME=$STACK_NAME"
```

The output should look something like below.

```
STACK_NAME=mod-9feefdd1672c4eac
```


Run the command below to load CloudFormation outputs as the environment variables.

```
for output in $(aws cloudformation describe-stacks --stack-name ${STACK_NAME} --query 'Stacks[].Outputs[].OutputKey' --output text)
do
export $output=$(aws cloudformation describe-stacks --stack-name ${STACK_NAME} --query 'Stacks[].Outputs[?OutputKey==`'$output'`].OutputValue' --output text)
eval "echo $output : \"\$$output\""
done
```

***Congratulations***, your Cloud9 workspace setup is complete, and you can continue with this workshop.

Loading

0 comments on commit 6682f3a

Please sign in to comment.