Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fis karpenter #210

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
188 changes: 188 additions & 0 deletions content/karpenter/050_scaling/fis_experiment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
---
title: "Use FIS to Interrupt a Spot Instance"
date: 2022-08-31T13:12:00-07:00
weight: 50
---

In this section, you're going to create and run an experiment to [trigger the interruption of Amazon EC2 Spot Instances using AWS Fault Injection Simulator (FIS)](https://aws.amazon.com/blogs/compute/implementing-interruption-tolerance-in-amazon-ec2-spot-with-aws-fault-injection-simulator/). When using Spot Instances, you need to be prepared to be interrupted. With FIS, you can test the resiliency of your workload and validate that your application is reacting to the interruption notices that EC2 sends before terminating your instances. You can target individual Spot Instances or a subset of instances in clusters managed by services that tag your instances such as ASG, EC2 Fleet, and EKS.

#### What do you need to get started?

Before you start launching Spot interruptions with FIS, you need to create an experiment template. Here is where you define which resources you want to interrupt (targets), and when you want to interrupt the instance.

Let's create a CloudFormation template which creates the IAM role (`FISSpotRole`) with the minimum permissions FIS needs to interrupt an instance, and the experiment template (`FISExperimentTemplate`) you're going to use to trigger a Spot interruption:

```
export FIS_EXP_NAME=fis-karpenter-spot-interruption
cat <<EoF > fis-karpenter.yaml
AWSTemplateFormatVersion: 2010-09-09
Description: FIS for Spot Instances
Parameters:
InstancesToInterrupt:
Description: Number of instances to interrupt
Default: 1
Type: Number

DurationBeforeInterruption:
Description: Number of minutes before the interruption
Default: 2
Type: Number

Resources:

FISSpotRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service: [fis.amazonaws.com]
Action: ["sts:AssumeRole"]
Path: /
Policies:
- PolicyName: root
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action: 'ec2:DescribeInstances'
Resource: '*'
- Effect: Allow
Action: 'ec2:SendSpotInstanceInterruptions'
Resource: 'arn:aws:ec2:*:*:instance/*'

FISExperimentTemplate:
Type: AWS::FIS::ExperimentTemplate
Properties:
Description: "Interrupt a spot instance with EKS label intent:apps"
Targets:
SpotIntances:
ResourceTags:
IntentLabel: apps
Filters:
- Path: State.Name
Values:
- running
ResourceType: aws:ec2:spot-instance
SelectionMode: !Join ["", ["COUNT(", !Ref InstancesToInterrupt, ")"]]
Actions:
interrupt:
ActionId: "aws:ec2:send-spot-instance-interruptions"
Description: "Interrupt a Spot instance"
Parameters:
durationBeforeInterruption: !Join ["", ["PT", !Ref DurationBeforeInterruption, "M"]]
Targets:
SpotInstances: SpotIntances
StopConditions:
- Source: none
RoleArn: !GetAtt FISSpotRole.Arn
Tags:
Name: "${FIS_EXP_NAME}"

Outputs:
FISExperimentID:
Value: !GetAtt FISExperimentTemplate.Id
EoF
```

Here are some important notes about the template:

* You can configure how many instances you want to interrupt with the `InstancesToInterrupt` parameter. In the template it's defined that it's going to interrupt **one** instance.
* You can also configure how much time you want the experiment to run with the `DurationBeforeInterruption` parameter. By default, it's going to take two minutes. This means that as soon as you launch the experiment, the instance is going to receive the two-minute notification Spot interruption warning.
* The most important section is the `Targets` from the experiment template. Under `ResourceTags` we have `IntentLabel: apps` which tells the experiment to only select from the EKS nodes we have labeled with `intent: apps`. If there is more than one instance still running with this label, the instance to be interrupted will be **chosen randomly**.

#### Create the EC2 Spot Interruption Experiment with FIS

Run the following commands to create the FIS experiment from your template, it will take a few moments for them to complete:

```
aws cloudformation create-stack --stack-name $FIS_EXP_NAME --template-body file://fis-karpenter.yaml --capabilities CAPABILITY_NAMED_IAM
aws cloudformation wait stack-create-complete --stack-name $FIS_EXP_NAME
```

#### Run the Spot Interruption Experiment

You can run the Spot interruption experiment by issuing the following commands:

```
FIS_EXP_TEMP_ID=$(aws cloudformation describe-stacks --stack-name $FIS_EXP_NAME --query "Stacks[0].Outputs[?OutputKey=='FISExperimentID'].OutputValue" --output text)
FIS_EXP_ID=$(aws fis start-experiment --experiment-template-id $FIS_EXP_TEMP_ID --no-cli-pager --query "experiment.id" --output text)
```

In a few seconds the experiment should complete. This means one of your instances has received a two minute instance interruption notice and will be terminated. You can see the status of the experiment by running:

```
aws fis get-experiment --id $FIS_EXP_ID --no-cli-pager
```

If the experiment completed successfully you should see a response like this:

```
{
"experiment": {

...

"state": {
"status": "completed",
"reason": "Experiment completed."
},
"targets": {
"SpotIntances": {
"resourceType": "aws:ec2:spot-instance",
"resourceTags": {
"IntentLabel": "apps"
},
"filters": [
{
"path": "State.Name",
"values": [
"running"
]
}
],
"selectionMode": "COUNT(1)"
}
},

...

}
}
```

If `status` is listed as `running`, wait a few seconds and run the command again. If `status` is listed as `failed` with `reason` as `Target resolution returned empty set` it means you do not have any Spot instances running with the `intent: apps` label and so no instance was selected for termination.

You can watch how your cluster reacts to the notice with kube-ops-view. Recall you can get the URL for your kube-ops-view by running:

```
kubectl get svc kube-ops-view | tail -n 1 | awk '{ print "Kube-ops-view URL = http://"$4 }'
```

{{% notice note %}}
You can interrupt more instances by running the experiment multiple times and watch how your cluster reacts, just reissue this command:
```
FIS_EXP_ID=$(aws fis start-experiment --experiment-template-id $FIS_EXP_TEMP_ID --no-cli-pager --query "experiment.id" --output text)
```
{{% /notice %}}

## What Have we learned in this section :

In this section we have learned:

* We have built an container image using a multi-stage approach and uploaded the resulting microservice into Amazon Elastic Container Registry (ECR).

* We have deployed a Monte Carlo Microservice applying all the lessons learned from the previous section.

* We have set up the Horizontal Pod Autoscaler (HPA) to scale our Monte Carlo microservice whenever the average CPU percentage exceeds 50%, We configured it to scale from 3 replicas to 100 replicas

* We have sent request to the Monte Carlo microservice to stress the CPU of the Pods where it runs. We saw in action dynamic scaling with HPA and Karpenter and now know can we appy this techniques to our kubernetes cluster

* We have created a FIS experiment and ran it to interrupt one of our Spot instances. We watched how the cluster responded using the visual web tool kube-ops-view.


{{% notice info %}}
Congratulations ! You have completed the dynamic scaling section of this workshop.
In the next sections we will collect our conclusions and clean up the setup.
{{% /notice %}}
19 changes: 0 additions & 19 deletions content/karpenter/050_scaling/test_hpa.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,22 +102,3 @@ or
kubectl top pods
```
{{% /expand %}}


## What Have we learned in this section :

In this section we have learned:

* We have built an container image using a multi-stage approach and uploaded the resulting microservice into Amazon Elastic Container Registry (ECR).

* We have deployed a Monte Carlo Microservice applying all the lessons learned from the previous section.

* We have set up the Horizontal Pod Autoscaler (HPA) to scale our Monte Carlo microservice whenever the average CPU percentage exceeds 50%, We configured it to scale from 3 replicas to 100 replicas

* We have sent request to the Monte Carlo microservice to stress the CPU of the Pods where it runs. We saw in action dynamic scaling with HPA and Karpenter and now know can we appy this techniques to our kubernetes cluster


{{% notice info %}}
Congratulations ! You have completed the dynamic scaling section of this workshop.
In the next sections we will collect our conclusions and clean up the setup.
{{% /notice %}}
5 changes: 5 additions & 0 deletions content/karpenter/200_cleanup/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@ If you're running in an account that was created for you as part of an AWS event
If you're running in your own account, make sure you run through these steps to make sure you don't encounter unwanted costs.
{{% /notice %}}

## Removing the CloudFormation stack used for FIS
```
aws cloudformation delete-stack --stack-name $FIS_EXP_NAME
```

## Cleaning up HPA, CA, and the Microservice
```
cd ~/environment
Expand Down