awslabs · rmccone · Aug 31, 2022 · Sep 1, 2022 · Sep 2, 2022
diff --git a/content/karpenter/050_scaling/fis_experiment.md b/content/karpenter/050_scaling/fis_experiment.md
@@ -0,0 +1,188 @@
+---
+title: "Use FIS to Interrupt a Spot Instance"
+date: 2022-08-31T13:12:00-07:00
+weight: 50
+---
+
+In this section, you're going to create and run an experiment to [trigger the interruption of Amazon EC2 Spot Instances using AWS Fault Injection Simulator (FIS)](https://aws.amazon.com/blogs/compute/implementing-interruption-tolerance-in-amazon-ec2-spot-with-aws-fault-injection-simulator/). When using Spot Instances, you need to be prepared to be interrupted. With FIS, you can test the resiliency of your workload and validate that your application is reacting to the interruption notices that EC2 sends before terminating your instances. You can target individual Spot Instances or a subset of instances in clusters managed by services that tag your instances such as ASG, EC2 Fleet, and EKS.
+
+#### What do you need to get started?
+
+Before you start launching Spot interruptions with FIS, you need to create an experiment template. Here is where you define which resources you want to interrupt (targets), and when you want to interrupt the instance. 
+
+Let's create a CloudFormation template which creates the IAM role (`FISSpotRole`) with the minimum permissions FIS needs to interrupt an instance, and the experiment template (`FISExperimentTemplate`) you're going to use to trigger a Spot interruption:
+
+```
+export FIS_EXP_NAME=fis-karpenter-spot-interruption
+cat <<EoF > fis-karpenter.yaml
+AWSTemplateFormatVersion: 2010-09-09
+Description: FIS for Spot Instances
+Parameters:
+  InstancesToInterrupt:
+    Description: Number of instances to interrupt
+    Default: 1
+    Type: Number
+
+  DurationBeforeInterruption:
+    Description: Number of minutes before the interruption
+    Default: 2
+    Type: Number
+
+Resources:
+
+  FISSpotRole:
+    Type: AWS::IAM::Role
+    Properties:
+      AssumeRolePolicyDocument:
+        Statement:
+        - Effect: Allow
+          Principal:
+            Service: [fis.amazonaws.com]
+          Action: ["sts:AssumeRole"]
+      Path: /
+      Policies:
+        - PolicyName: root
+          PolicyDocument:
+            Version: "2012-10-17"
+            Statement:
+              - Effect: Allow
+                Action: 'ec2:DescribeInstances'
+                Resource: '*'
+              - Effect: Allow
+                Action: 'ec2:SendSpotInstanceInterruptions'
+                Resource: 'arn:aws:ec2:*:*:instance/*'
+
+  FISExperimentTemplate:
+    Type: AWS::FIS::ExperimentTemplate
+    Properties:       
+      Description: "Interrupt a spot instance with EKS label intent:apps"
+      Targets: 
+        SpotIntances:
+          ResourceTags: 
+            IntentLabel: apps
+          Filters:
+            - Path: State.Name
+              Values: 
+              - running
+          ResourceType: aws:ec2:spot-instance
+          SelectionMode: !Join ["", ["COUNT(", !Ref InstancesToInterrupt, ")"]]
+      Actions: 
+        interrupt:
+          ActionId: "aws:ec2:send-spot-instance-interruptions"
+          Description: "Interrupt a Spot instance"
+          Parameters: 
+            durationBeforeInterruption: !Join ["", ["PT", !Ref DurationBeforeInterruption, "M"]]
+          Targets: 
+            SpotInstances: SpotIntances
+      StopConditions:
+        - Source: none
+      RoleArn: !GetAtt FISSpotRole.Arn
+      Tags: 
+        Name: "${FIS_EXP_NAME}"
+
+Outputs:
+  FISExperimentID:
+    Value: !GetAtt FISExperimentTemplate.Id
+EoF
+```
+
+Here are some important notes about the template:
+
+* You can configure how many instances you want to interrupt with the `InstancesToInterrupt` parameter. In the template it's defined that it's going to interrupt **one** instance.
+* You can also configure how much time you want the experiment to run with the `DurationBeforeInterruption` parameter. By default, it's going to take two minutes. This means that as soon as you launch the experiment, the instance is going to receive the two-minute notification Spot interruption warning.
+* The most important section is the `Targets` from the experiment template. Under `ResourceTags` we have `IntentLabel: apps` which tells the experiment to only select from the EKS nodes we have labeled with `intent: apps`. If there is more than one instance still running with this label, the instance to be interrupted will be **chosen randomly**.
+
+#### Create the EC2 Spot Interruption Experiment with FIS
+
+Run the following commands to create the FIS experiment from your template, it will take a few moments for them to complete:
+
+```
+aws cloudformation create-stack --stack-name $FIS_EXP_NAME --template-body file://fis-karpenter.yaml --capabilities CAPABILITY_NAMED_IAM
+aws cloudformation wait stack-create-complete --stack-name $FIS_EXP_NAME
+```
+
+#### Run the Spot Interruption Experiment
+
+You can run the Spot interruption experiment by issuing the following commands:
+
+```
+FIS_EXP_TEMP_ID=$(aws cloudformation describe-stacks --stack-name $FIS_EXP_NAME --query "Stacks[0].Outputs[?OutputKey=='FISExperimentID'].OutputValue" --output text)
+FIS_EXP_ID=$(aws fis start-experiment --experiment-template-id $FIS_EXP_TEMP_ID --no-cli-pager --query "experiment.id" --output text)
+```
+
+In a few seconds the experiment should complete. This means one of your instances has received a two minute instance interruption notice and will be terminated. You can see the status of the experiment by running:
+
+```
+aws fis get-experiment --id $FIS_EXP_ID --no-cli-pager
+```
+
+If the experiment completed successfully you should see a response like this:
+
+```
+{
+    "experiment": {
+
+        ...
+
+        "state": {
+            "status": "completed",
+            "reason": "Experiment completed."
+        },
+        "targets": {
+            "SpotIntances": {
+                "resourceType": "aws:ec2:spot-instance",
+                "resourceTags": {
+                    "IntentLabel": "apps"
+                },
+                "filters": [
+                    {
+                        "path": "State.Name",
+                        "values": [
+                            "running"
+                        ]
+                    }
+                ],
+                "selectionMode": "COUNT(1)"
+            }
+        },
+
+        ...
+
+    }
+}
+```
+
+If `status` is listed as `running`, wait a few seconds and run the command again. If `status` is listed as `failed` with `reason` as `Target resolution returned empty set` it means you do not have any Spot instances running with the `intent: apps` label and so no instance was selected for termination.
+
+You can watch how your cluster reacts to the notice with kube-ops-view. Recall you can get the URL for your kube-ops-view by running:
+
+```
+kubectl get svc kube-ops-view | tail -n 1 | awk '{ print "Kube-ops-view URL = http://"$4 }'
+```
+
+{{% notice note %}}
+You can interrupt more instances by running the experiment multiple times and watch how your cluster reacts, just reissue this command:
+```
+FIS_EXP_ID=$(aws fis start-experiment --experiment-template-id $FIS_EXP_TEMP_ID --no-cli-pager --query "experiment.id" --output text)
+```
+{{% /notice %}}
+
+## What Have we learned in this section : 
+
+In this section we have learned:
+
+* We have built an container image using a multi-stage approach and uploaded the resulting microservice into Amazon Elastic Container Registry (ECR).
+
+* We have deployed a Monte Carlo Microservice applying all the lessons learned from the previous section.
+
+* We have set up the Horizontal Pod Autoscaler (HPA) to scale our Monte Carlo microservice whenever the average CPU percentage exceeds 50%, We configured it to scale from 3 replicas to 100 replicas
+
+* We have sent request to the Monte Carlo microservice to stress the CPU of the Pods where it runs. We saw in action dynamic scaling with HPA and Karpenter and now know can we appy this techniques to our kubernetes cluster
+
+* We have created a FIS experiment and ran it to interrupt one of our Spot instances. We watched how the cluster responded using the visual web tool kube-ops-view.
+
+
+{{% notice info %}}
+Congratulations ! You have completed the dynamic scaling section of this workshop.
+In the next sections we will collect our conclusions and clean up the setup.
+{{% /notice %}}
diff --git a/content/karpenter/050_scaling/test_hpa.md b/content/karpenter/050_scaling/test_hpa.md
@@ -102,22 +102,3 @@ or
 kubectl top pods
 ```
 {{% /expand %}}
-
-
-## What Have we learned in this section : 
-
-In this section we have learned:
-
-* We have built an container image using a multi-stage approach and uploaded the resulting microservice into Amazon Elastic Container Registry (ECR).
-
-* We have deployed a Monte Carlo Microservice applying all the lessons learned from the previous section.
-
-* We have set up the Horizontal Pod Autoscaler (HPA) to scale our Monte Carlo microservice whenever the average CPU percentage exceeds 50%, We configured it to scale from 3 replicas to 100 replicas
-
-* We have sent request to the Monte Carlo microservice to stress the CPU of the Pods where it runs. We saw in action dynamic scaling with HPA and Karpenter and now know can we appy this techniques to our kubernetes cluster
-
-
-{{% notice info %}}
-Congratulations ! You have completed the dynamic scaling section of this workshop.
-In the next sections we will collect our conclusions and clean up the setup.
-{{% /notice %}}
diff --git a/content/karpenter/200_cleanup/_index.md b/content/karpenter/200_cleanup/_index.md
@@ -10,6 +10,11 @@ If you're running in an account that was created for you as part of an AWS event
 If you're running in your own account, make sure you run through these steps to make sure you don't encounter unwanted costs.
 {{% /notice %}}
 
+## Removing the CloudFormation stack used for FIS
+```
+aws cloudformation delete-stack --stack-name $FIS_EXP_NAME
+```
+
 ## Cleaning up HPA, CA, and the Microservice
 ```
 cd ~/environment