Skip to content

Commit

Permalink
Merge pull request #115 from kvrajesh/awslabs-ecs
Browse files Browse the repository at this point in the history
Modified Module-2
  • Loading branch information
ajpaws authored Oct 16, 2020
2 parents e4d9743 + 9b506e0 commit e7c7de8
Show file tree
Hide file tree
Showing 12 changed files with 95 additions and 484 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
title: "Inturruption Handling On EC2 Spot Instances"
weight: 80
---

Amazon EC2 terminates your Spot Instance when it needs the capacity back. Amazon EC2 provides a Spot Instance interruption notice, which gives the instance a two-minute warning before it is interrupted.

When Amazon EC2 is going to interrupt your Spot Instance, the interruption notification will be available in two ways:

1. ***Amazon EventBridge Events:*** EC2 service emits an event two minutes prior to the actual interruption. This event can be detected by Amazon CloudWatch Events.

1. ***EC2 Instance Metadata service (IMDS):*** If your Spot Instance is marked for termination by EC2, the instance-action item is present in your instance metadata.

In the Launch Template configuration, we added:
```plaintext
echo "ECS_ENABLE_SPOT_INSTANCE_DRAINING=true" >> /etc/ecs/ecs.config
```
When Amazon ECS Spot Instance draining is enabled on the instance, ECS receives the Spot Instance interruption notice and places the instance in DRAINING status. When a container instance is set to DRAINING, Amazon ECS prevents new tasks from being scheduled for placement on the container instance [Click here](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/container-instance-spot.html) to learn more.

The web application (app.py) we used to buld docker image in this module shows two ways to handle the EC2 Spot interruption within a docker container. This allows you to perform actions such as preventing the processing of new work, checkpointing the progress of a batch job, or gracefully exiting the application to complete tasks such as ensuring database connections are properly closed

In the first method, it checks the instance metadata service for spot interruption and display a message to web page notifying the users (this is, of course, just a demonstration and not for real-life scenarios).

{{% notice warning %}}
In a production environment, you should not provide access from the ECS tasks to the IMDS. This is done in this workshop for simplification purposes.
{{% /notice %}}


```plaintext
URL = "http://169.254.169.254/latest/meta-data/spot/termination-time"
SpotInt = requests.get(URL)
if SpotInt.status_code == 200:
response += "<h1>This Spot Instance will be terminated at: {} </h1> <hr/>".format(SpotInt.text)
```

In the second method, it listens to the **SIGTERM** signal. The ECS container agent calls the StopTask API to stop all the tasks running on the Spot Instance.

When StopTask is called on a task, the equivalent of docker stop is issued to the containers running in the task. This results in a **SIGTERM** value and a default 30-second timeout, after which the SIGKILL value is sent and the containers are forcibly stopped. If the container handles the **SIGTERM** value gracefully and exits within 30 seconds from receiving it, no SIGKILL value is sent.

```python
class Ec2SpotInterruptionHandler:
signals = {
signal.SIGINT: 'SIGINT',
signal.SIGTERM: 'SIGTERM'
}

def __init__(self):
signal.signal(signal.SIGINT, self.exit_gracefully)
signal.signal(signal.SIGTERM, self.exit_gracefully)

def exit_gracefully(self, signum, frame):
print("\nReceived {} signal".format(self.signals[signum]))
if self.signals[signum] == 'SIGTERM':
print("Looks like there is a Spot Interruption. Let's wrap up the processing to avoid forceful killing of the applucation in next 30 sec ...")
```

Spot Interruption Handling on ECS Fargate Spot
---

When tasks using Fargate Spot capacity are stopped due to a Spot interruption, a two-minute warning is sent before a task is stopped. The warning is sent as a task state change event to Amazon EventBridge
and a SIGTERM signal to the running task. When using Fargate Spot as part of a service, the service
scheduler will receive the interruption signal and attempt to launch additional tasks on Fargate Spot if
capacity is available.

To ensure that your containers exit gracefully before the task stops, the following can be configured:

• A stopTimeout value of 120 seconds or less can be specified in the container definition that the task
is using. Specifying a stopTimeout value gives you time between the moment the task state change event is received and the point at which the container is forcefully stopped.

• The **SIGTERM** signal must be received from within the container to perform any cleanup actions.

79 changes: 22 additions & 57 deletions content/ecs-spot-capacity-providers/module-2/_index.md
Original file line number Diff line number Diff line change
@@ -1,77 +1,42 @@
---
title: "Module-2: Spot Interruption Handling"
title: "Module-2 (Optional): Saving costs using AWS Fargate Spot Capacity Providers"
weight: 40
---

Inturruption Handling On EC2 Spot Instances
AWS Fargate Capacity Providers
---

Amazon EC2 terminates your Spot Instance when it needs the capacity back. Amazon EC2 provides a Spot Instance interruption notice, which gives the instance a two-minute warning before it is interrupted.
Amazon ECS cluster capacity providers enable you to use both Fargate and Fargate Spot capacity with your Amazon ECS tasks. With Fargate Spot you can run interruption tolerant Amazon ECS tasks at a discounted rate compared to the Fargate price. Fargate Spot runs tasks on spare compute capacity. When AWS needs the capacity back, your tasks will be interrupted with a two-minute warning

When Amazon EC2 is going to interrupt your Spot Instance, the interruption notification will be available in two ways:

- ***Amazon EventBridge Events***
Creating a New ECS Cluster That Uses Fargate Capacity Providers
---

EC2 service emits an event two minutes prior to the actual interruption. This event can be detected by Amazon CloudWatch Events.
When a new Amazon ECS cluster is created, you specify one or more capacity providers to associate with the cluster. The associated capacity providers determine the infrastructure to run your tasks on. Set the following global variables for the names of resources be created in this workshop

- ***EC2 Instance Metadata service (IMDS)***
Run the following command to create a new cluster and associate both the Fargate and Fargate Spot capacity providers with it.

If your Spot Instance is marked for termination by EC2, the instance-action item is present in your instance metadata.
In the Launch Template configuration, we added:
```plaintext
echo "ECS_ENABLE_SPOT_INSTANCE_DRAINING=true" >> /etc/ecs/ecs.config
```
When Amazon ECS Spot Instance draining is enabled on the instance, ECS receives the Spot Instance interruption notice and places the instance in DRAINING status. When a container instance is set to DRAINING, Amazon ECS prevents new tasks from being scheduled for placement on the container instance [Click here](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/container-instance-spot.html) to learn more.

The web application (app.py) we used to buld docker image in this module shows two ways to handle the EC2 Spot interruption within a docker container. This allows you to perform actions such as preventing the processing of new work, checkpointing the progress of a batch job, or gracefully exiting the application to complete tasks such as ensuring database connections are properly closed

In the first method, it checks the instance metadata service for spot interruption and display a message to web page notifying the users (this is, of course, just a demonstration and not for real-life scenarios).

{{% notice warning %}}
In a production environment, you should not provide access from the ECS tasks to the IMDS. This is done in this workshop for simplification purposes.
{{% /notice %}}


```plaintext
URL = "http://169.254.169.254/latest/meta-data/spot/termination-time"
SpotInt = requests.get(URL)
if SpotInt.status_code == 200:
response += "<h1>This Spot Instance will be terminated at: {} </h1> <hr/>".format(SpotInt.text)
aws ecs create-cluster \
--cluster-name EcsSpotWorkshop \
--capacity-providers FARGATE FARGATE_SPOT \
--region $AWS_REGION \
--default-capacity-provider-strategy capacityProvider=FARGATE,base=1,weight=1
```
If the above command fails with below error, run the command again. It should create the cluster now.

In the second method, it listens to the **SIGTERM** signal. The ECS container agent calls the StopTask API to stop all the tasks running on the Spot Instance.

When StopTask is called on a task, the equivalent of docker stop is issued to the containers running in the task. This results in a **SIGTERM** value and a default 30-second timeout, after which the SIGKILL value is sent and the containers are forcibly stopped. If the container handles the **SIGTERM** value gracefully and exits within 30 seconds from receiving it, no SIGKILL value is sent.

```python
class Ec2SpotInterruptionHandler:
signals = {
signal.SIGINT: 'SIGINT',
signal.SIGTERM: 'SIGTERM'
}

def __init__(self):
signal.signal(signal.SIGINT, self.exit_gracefully)
signal.signal(signal.SIGTERM, self.exit_gracefully)

def exit_gracefully(self, signum, frame):
print("\nReceived {} signal".format(self.signals[signum]))
if self.signals[signum] == 'SIGTERM':
print("Looks like there is a Spot Interruption. Let's wrap up the processing to avoid forceful killing of the applucation in next 30 sec ...")
```
“An error occurred (InvalidParameterException) when calling the CreateCluster operation: Unable to assume the service linked role. Please verify that the ECS service linked role exists.“
```

Spot Interruption Handling on ECS Fargate Spot
---
The ECS cluster will look like below in the AWS Console. Select ECS in **Services** and click on **Clusters** on left panel

![ECS Cluster](/images/ecs-spot-capacity-providers/c1.png)

When tasks using Fargate Spot capacity are stopped due to a Spot interruption, a two-minute warning is sent before a task is stopped. The warning is sent as a task state change event to Amazon EventBridge
and a SIGTERM signal to the running task. When using Fargate Spot as part of a service, the service
scheduler will receive the interruption signal and attempt to launch additional tasks on Fargate Spot if
capacity is available.
Note that above ECS cluster create command also specifies a default capacity provider strategy.

To ensure that your containers exit gracefully before the task stops, the following can be configured:
The strategy sets FARGATE as the default capacity provider. That means if there is no capacity provider strategy specified during the deployment of Tasks/Services, ECS by default chooses the FARGATE Capacity Provider to launch them.

• A stopTimeout value of 120 seconds or less can be specified in the container definition that the task
is using. Specifying a stopTimeout value gives you time between the moment the task state change event is received and the point at which the container is forcefully stopped.
Click _***Update Cluster***_ on the top right corner to see default Capacity Provider Strategy. As shown base=1 is set for FARGATE Capacity Provider.

• The **SIGTERM** signal must be received from within the container to perform any cleanup actions.
![ECS Cluster](/images/ecs-spot-capacity-providers/c2.png)

91 changes: 0 additions & 91 deletions content/ecs-spot-capacity-providers/module-2/asg_with_od.md

This file was deleted.

53 changes: 0 additions & 53 deletions content/ecs-spot-capacity-providers/module-2/asg_with_spot.md

This file was deleted.

Loading

0 comments on commit e7c7de8

Please sign in to comment.