diff --git a/content/using_ec2_spot_instances_with_eks/cleanup.md b/content/using_ec2_spot_instances_with_eks/cleanup.md index 40fcf731..2c2d60be 100644 --- a/content/using_ec2_spot_instances_with_eks/cleanup.md +++ b/content/using_ec2_spot_instances_with_eks/cleanup.md @@ -18,7 +18,7 @@ Before you clean up the resources and complete the workshop, you may want to rev kubectl delete hpa monte-carlo-pi-service kubectl delete -f ~/environment/cluster-autoscaler/cluster_autoscaler.yml kubectl delete -f monte-carlo-pi-service.yml -helm delete --purge kube-ops-view kube-resource-report metrics-server +helm delete kube-ops-view metrics-server ``` ## Removing eks nodegroups diff --git a/content/using_ec2_spot_instances_with_eks/eksctl/launcheks.md b/content/using_ec2_spot_instances_with_eks/eksctl/launcheks.md index bcfb5c2d..be7f7c18 100644 --- a/content/using_ec2_spot_instances_with_eks/eksctl/launcheks.md +++ b/content/using_ec2_spot_instances_with_eks/eksctl/launcheks.md @@ -41,7 +41,7 @@ The following command will create an eks cluster with the name `eksworkshop-eksc .It will also create a nodegroup with 2 on-demand instances. ``` -eksctl create cluster --version=1.15 --name=eksworkshop-eksctl --node-private-networking --managed --nodes=2 --alb-ingress-access --region=${AWS_REGION} --node-labels="lifecycle=OnDemand,intent=control-apps" --asg-access +eksctl create cluster --version=1.16 --name=eksworkshop-eksctl --node-private-networking --managed --nodes=2 --alb-ingress-access --region=${AWS_REGION} --node-labels="lifecycle=OnDemand,intent=control-apps" --asg-access ``` eksctl allows us to pass parameters to initialize the cluster. While initializing the cluster, eksctl does also allow us to create nodegroups. diff --git a/content/using_ec2_spot_instances_with_eks/eksctl/prerequisites.md b/content/using_ec2_spot_instances_with_eks/eksctl/prerequisites.md index fc46da82..81807283 100644 --- a/content/using_ec2_spot_instances_with_eks/eksctl/prerequisites.md +++ b/content/using_ec2_spot_instances_with_eks/eksctl/prerequisites.md @@ -6,7 +6,7 @@ weight: 10 For this module, we need to download the [eksctl](https://eksctl.io/) binary: ``` -export EKSCTL_VERSION=0.18.0 +export EKSCTL_VERSION=0.23.0 curl --silent --location "https://github.com/weaveworks/eksctl/releases/download/${EKSCTL_VERSION}/eksctl_Linux_amd64.tar.gz" | tar xz -C /tmp sudo mv -v /tmp/eksctl /usr/local/bin ``` diff --git a/content/using_ec2_spot_instances_with_eks/helm_root/deploy_metric_server.md b/content/using_ec2_spot_instances_with_eks/helm_root/deploy_metric_server.md index 0ae4c15b..ed246485 100644 --- a/content/using_ec2_spot_instances_with_eks/helm_root/deploy_metric_server.md +++ b/content/using_ec2_spot_instances_with_eks/helm_root/deploy_metric_server.md @@ -8,8 +8,9 @@ weight: 20 Metrics Server is a cluster-wide aggregator of resource usage data. These metrics will drive the scaling behavior of the [deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/). We will deploy the metrics server using `Helm` configured earlier in this workshop. ``` -helm install stable/metrics-server \ - --name metrics-server \ +kubectl create namespace metrics +helm install metrics-server \ + stable/metrics-server \ --version 2.10.0 \ --namespace metrics ``` diff --git a/content/using_ec2_spot_instances_with_eks/helm_root/helm_deploy.md b/content/using_ec2_spot_instances_with_eks/helm_root/helm_deploy.md index 2eb28e25..2688a037 100644 --- a/content/using_ec2_spot_instances_with_eks/helm_root/helm_deploy.md +++ b/content/using_ec2_spot_instances_with_eks/helm_root/helm_deploy.md @@ -4,63 +4,43 @@ date: 2018-08-07T08:30:11-07:00 weight: 10 --- -Before we can get started configuring `helm` we'll need to first install the command line tools that you will interact with. To do this run the following. +## Install the Helm CLI -``` -cd ~/environment -curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get > get_helm.sh -chmod +x get_helm.sh -./get_helm.sh +Before we can get started configuring Helm, we'll need to first install the +command line tools that you will interact with. To do this, run the following: + +```sh +curl -sSL https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash ``` -{{% notice info %}} -Once you install helm, the command will prompt you to run 'helm init'. **Do not run 'helm init'.** Follow the instructions to configure helm using **Kubernetes RBAC** and then install tiller as specified below -If you accidentally run 'helm init', you can safely uninstall tiller by running 'helm reset --force' -{{% /notice %}} +We can verify the version -### Configure Helm access with RBAC +```sh +helm version --short +``` -Helm relies on a service called **tiller** that requires special permission on the -kubernetes cluster, so we need to build a _**Service Account**_ for **tiller** -to use. We'll then apply this to the cluster. +Let's configure our first Chart repository. Chart repositories are similar to +APT or yum repositories that you might be familiar with on Linux, or Taps for +Homebrew on macOS. -To create a new service account manifest: -``` -cat < ~/environment/rbac.yaml ---- -apiVersion: v1 -kind: ServiceAccount -metadata: - name: tiller - namespace: kube-system ---- -apiVersion: rbac.authorization.k8s.io/v1beta1 -kind: ClusterRoleBinding -metadata: - name: tiller -roleRef: - apiGroup: rbac.authorization.k8s.io - kind: ClusterRole - name: cluster-admin -subjects: - - kind: ServiceAccount - name: tiller - namespace: kube-system -EoF -``` +Download the `stable` repository so we have something to start with: -Next apply the config: -``` -kubectl apply -f ~/environment/rbac.yaml +```sh +helm repo add stable https://kubernetes-charts.storage.googleapis.com/ +helm repo update ``` -Then we can install **tiller** using the **helm** tooling +Once this is installed, we will be able to list the charts you can install: -``` -helm init --service-account tiller +```sh +helm search repo stable ``` -This will install **tiller** into the cluster which gives it access to manage -resources in your cluster. - +Finally, let's configure Bash completion for the `helm` command: +```sh +helm completion bash >> ~/.bash_completion +. /etc/profile.d/bash_completion.sh +. ~/.bash_completion +source <(helm completion bash) +``` diff --git a/content/using_ec2_spot_instances_with_eks/helm_root/install_kube_ops_view.md b/content/using_ec2_spot_instances_with_eks/helm_root/install_kube_ops_view.md index e35f7cc6..0f8f5444 100644 --- a/content/using_ec2_spot_instances_with_eks/helm_root/install_kube_ops_view.md +++ b/content/using_ec2_spot_instances_with_eks/helm_root/install_kube_ops_view.md @@ -10,9 +10,8 @@ that will help with understanding our cluster setup in a visual way. The first o The following line updates the stable helm repository and then installs kube-ops-view using a LoadBalancer Service type and creating a RBAC (Resource Base Access Control) entry for the read-only service account to read nodes and pods information from the cluster. ``` -helm repo update -helm install stable/kube-ops-view \ ---name kube-ops-view \ +helm install kube-ops-view \ +stable/kube-ops-view \ --set service.type=LoadBalancer \ --set nodeSelector.intent=control-apps \ --set rbac.create=True @@ -60,6 +59,15 @@ Spend some time checking the state and properties of your EKS cluster. ![kube-ops-view](/images/using_ec2_spot_instances_with_eks/helm/kube-ops-view-legend.png) + \ No newline at end of file diff --git a/content/using_ec2_spot_instances_with_eks/jenkins/autoscaling_nodes.md b/content/using_ec2_spot_instances_with_eks/jenkins/autoscaling_nodes.md index 89b5c1b2..83e4436a 100644 --- a/content/using_ec2_spot_instances_with_eks/jenkins/autoscaling_nodes.md +++ b/content/using_ec2_spot_instances_with_eks/jenkins/autoscaling_nodes.md @@ -4,24 +4,13 @@ date: 2018-08-07T08:30:11-07:00 weight: 80 --- -In a previous module in this workshop, we saw that we can use Kubernetes cluster-autoscaler to automatically increase the size of our nodegroups (EC2 Auto Scaling groups) when our Kubernetes deployment scaled out, and some of the pods remained in `pending` state due to lack of resources on the cluster. Let's implement the same concept for our Jenkins worker nodes and see this in action. +In a previous module in this workshop, we saw that we can use Kubernetes cluster-autoscaler to automatically increase the size of our nodegroups (EC2 Auto Scaling groups) when our Kubernetes deployment scaled out, and some of the pods remained in `pending` state due to lack of resources on the cluster. Let's check the same concept applies for our Jenkins worker nodes and see this in action. + +If you recall, Cluster Autoscaler was configured to Auto-Discover Auto Scaling groups created with the tags : k8s.io/cluster-autoscaler/enabled, and k8s.io/cluster-autoscaler/eksworkshop-eksctl. You can find out in the AWS Console section for **EC2 -> Auto Scaling Group**, that the new jenkins node group does indeed have the right tags defined. -#### Configuring cluster-autoscaler to use our new Jenkins dedicated nodegroup -1\. Edit the cluster-autoscaler deployment configuration\ -```bash -kubectl edit deployment cluster-autoscaler -n kube-system -``` -2\. Under the two `--nodes=` lines where you configured your EC2 Auto Scaling group names in the previous module, add another line with the name of the new Jenkins dedicated nodegroup, so your file looks like this (but with different ASG names which you collected from the EC2 Management Console)\ -``` ---nodes=0:5:eksctl-eksworkshop-eksctl10-nodegroup-dev-8vcpu-32gb-spot-NodeGroup-16XJ6GMZCT3XQ ---nodes=0:5:eksctl-eksworkshop-eksctl10-nodegroup-dev-4vcpu-16gb-spot-NodeGroup-1RBXH0I6585MX ---nodes=0:5:eksctl-eksworkshop-eksctl10-nodegroup-jenkins-agents-2vcpu-8gb-spot-2-NodeGroup-7GE4LS6B34DK -``` -3\. Once you save/quit the file with `:x!`, the new configuration will apply\ {{% notice tip %}} -CI/CD workloads can benefit of Cluster Autoscaler ability to scale down to 0! Capacity will -be provided just when needed, which increases further cost savings. +CI/CD workloads can benefit of Cluster Autoscaler ability to scale down to 0! Capacity will be provided just when needed, which increases further cost savings. {{% /notice %}} #### Running multiple Jenkins jobs to reach a Pending pods state diff --git a/content/using_ec2_spot_instances_with_eks/jenkins/jenkins_cleanup.md b/content/using_ec2_spot_instances_with_eks/jenkins/jenkins_cleanup.md index 11deff29..9d18c68e 100644 --- a/content/using_ec2_spot_instances_with_eks/jenkins/jenkins_cleanup.md +++ b/content/using_ec2_spot_instances_with_eks/jenkins/jenkins_cleanup.md @@ -14,12 +14,6 @@ If you're running in your own account, make sure you run through these steps to helm delete cicd ``` -### Removing the Jenkins nodegroup from cluster-autoscaler -``` -kubectl edit deployment cluster-autoscaler -n kube-system -``` -Delete the third **\-\-nodes=** line that contains the Jenkins nodegroup name. - ### Removing the Jenkins nodegroup ``` eksctl delete nodegroup -f spot_nodegroup_jenkins.yml --approve diff --git a/content/using_ec2_spot_instances_with_eks/jenkins/setup_jenkins.md b/content/using_ec2_spot_instances_with_eks/jenkins/setup_jenkins.md index e6b63b5f..00975d48 100644 --- a/content/using_ec2_spot_instances_with_eks/jenkins/setup_jenkins.md +++ b/content/using_ec2_spot_instances_with_eks/jenkins/setup_jenkins.md @@ -7,7 +7,7 @@ weight: 30 #### Install Jenkins ``` -helm install stable/jenkins --set rbac.create=true,master.servicePort=80,master.serviceType=LoadBalancer --name cicd +helm install cicd stable/jenkins --set rbac.create=true,master.servicePort=80,master.serviceType=LoadBalancer ``` The output of this command will give you some additional information such as the @@ -27,7 +27,7 @@ Once the pod status changes to `running`, we can get the load balancer address w ``` export SERVICE_IP=$(kubectl get svc --namespace default cicd-jenkins --template "{{ range (index .status.loadBalancer.ingress 0) }}{{ . }}{{ end }}") -echo http://$SERVICE_IP/login +echo "Jenkins running at : http://$SERVICE_IP/login" ``` The expected result should be similar to: diff --git a/content/using_ec2_spot_instances_with_eks/prerequisites/at_an_aws_validaterole.md b/content/using_ec2_spot_instances_with_eks/prerequisites/at_an_aws_validaterole.md index b3d7afaa..e5ab856a 100644 --- a/content/using_ec2_spot_instances_with_eks/prerequisites/at_an_aws_validaterole.md +++ b/content/using_ec2_spot_instances_with_eks/prerequisites/at_an_aws_validaterole.md @@ -18,6 +18,6 @@ If the _Arn_ contains the role name from above and an Instance ID, you may proce { "Account": "123456789012", "UserId": "AROA1SAMPLEAWSIAMROLE:i-01234567890abcdef", - "Arn": "arn:aws:sts::123456789012:assumed-role/TeamRole/MasterRole" + "Arn": "arn:aws:sts::216876048363:assumed-role/TeamRole/i-0dd09eac19be01448" } ``` \ No newline at end of file diff --git a/content/using_ec2_spot_instances_with_eks/prerequisites/k8stools.md b/content/using_ec2_spot_instances_with_eks/prerequisites/k8stools.md index 88a182c7..1eb4f9f8 100644 --- a/content/using_ec2_spot_instances_with_eks/prerequisites/k8stools.md +++ b/content/using_ec2_spot_instances_with_eks/prerequisites/k8stools.md @@ -15,14 +15,14 @@ for the download links.](https://docs.aws.amazon.com/eks/latest/userguide/gettin #### Install kubectl ``` -export KUBECTL_VERSION=v1.15.10 +export KUBECTL_VERSION=v1.16.12 sudo curl --silent --location -o /usr/local/bin/kubectl https://storage.googleapis.com/kubernetes-release/release/${KUBECTL_VERSION}/bin/linux/amd64/kubectl sudo chmod +x /usr/local/bin/kubectl ``` #### Install JQ and envsubst ``` -sudo yum -y install jq gettext +sudo yum -y install jq gettext bash-completion ``` #### Verify the binaries are in the path and executable diff --git a/content/using_ec2_spot_instances_with_eks/prerequisites/sshkey.md b/content/using_ec2_spot_instances_with_eks/prerequisites/sshkey.md index ca3f8a6b..dedeb8c6 100644 --- a/content/using_ec2_spot_instances_with_eks/prerequisites/sshkey.md +++ b/content/using_ec2_spot_instances_with_eks/prerequisites/sshkey.md @@ -20,12 +20,6 @@ Press `enter` 3 times to take the default choices Upload the public key to your EC2 region: -```bash -aws ec2 import-key-pair --key-name "eksworkshop" --public-key-material file://~/.ssh/id_rsa.pub -``` - -If you got an error similar to `An error occurred (InvalidKey.Format) when calling the ImportKeyPair operation: Key is not in valid OpenSSH public key format` then you can try this command instead: - ```bash aws ec2 import-key-pair --key-name "eksworkshop" --public-key-material fileb://~/.ssh/id_rsa.pub -``` \ No newline at end of file +``` diff --git a/content/using_ec2_spot_instances_with_eks/scaling/deploy_ca.files/cluster_autoscaler.yml b/content/using_ec2_spot_instances_with_eks/scaling/deploy_ca.files/cluster_autoscaler.yml index 93442611..abe2c53b 100644 --- a/content/using_ec2_spot_instances_with_eks/scaling/deploy_ca.files/cluster_autoscaler.yml +++ b/content/using_ec2_spot_instances_with_eks/scaling/deploy_ca.files/cluster_autoscaler.yml @@ -47,6 +47,9 @@ rules: - apiGroups: ["storage.k8s.io"] resources: ["storageclasses"] verbs: ["watch","list","get"] +- apiGroups: ["storage.k8s.io"] + resources: ["csinodes"] + verbs: ["watch","list","get"] - apiGroups: ["batch"] resources: ["jobs"] verbs: ["watch","list","get"] @@ -126,7 +129,7 @@ spec: nodeSelector: intent: control-apps containers: - - image: k8s.gcr.io/cluster-autoscaler:v1.15.5 + - image: us.gcr.io/k8s-artifacts-prod/autoscaling/cluster-autoscaler:v1.16.5 name: cluster-autoscaler resources: limits: @@ -141,8 +144,7 @@ spec: - --stderrthreshold=info - --cloud-provider=aws - --skip-nodes-with-local-storage=false - - --nodes=0:5: - - --nodes=0:5: + - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/eksworkshop-eksctl - --expander=random - --expendable-pods-priority-cutoff=-10 - --scale-down-unneeded-time=2m0s diff --git a/content/using_ec2_spot_instances_with_eks/scaling/deploy_ca.md b/content/using_ec2_spot_instances_with_eks/scaling/deploy_ca.md index ea67cbeb..ea89714e 100644 --- a/content/using_ec2_spot_instances_with_eks/scaling/deploy_ca.md +++ b/content/using_ec2_spot_instances_with_eks/scaling/deploy_ca.md @@ -7,13 +7,21 @@ weight: 10 We will start by deploying [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler). Cluster Autoscaler for AWS provides integration with Auto Scaling groups. It enables users to choose from four different options of deployment: * One Auto Scaling group -* **Multiple Auto Scaling groups** - This is what we will use -* Auto-Discovery +* Multiple Auto Scaling groups +* **Auto-Discovery** - This is what we will use * Master Node setup -In this workshop we will configure Cluster Autoscaler to scale using the Autoscaling groups associated with the nodegroups that we created in the [Adding Spot Workers with eksctl]({{< ref "/using_ec2_spot_instances_with_eks/spotworkers/workers_eksctl.md" >}}) section. +In this workshop we will configure Cluster Autoscaler to scale using **[Cluster Autoscaler Auto-Discovery functionality](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md)**. When configured in Auto-Discovery mode on AWS, Cluster Autoscaler will look for Auto Scaling Groups that match a set of pre-set AWS tags. As a convention we use the tags : `k8s.io/cluster-autoscaler/enabled`, and `k8s.io/cluster-autoscaler/eksworkshop-eksctl` . + +This will select the two Auto Scaling groups that have been created for Spot instances. + +{{% notice note %}} +The **[following link](https://console.aws.amazon.com/ec2/autoscaling/home?#AutoScalingGroups:filter=eksctl-eksworkshop-eksctl-nodegroup-dev;view=details)** Should take you to the +Auto Scaling Group console and select the two spot node-group we have previously created; You should check that +the tags `k8s.io/cluster-autoscaler/enabled`, and `k8s.io/cluster-autoscaler/eksworkshop-eksctl` are present +in both groups. This has been done automatically by **eksctl** upon creation of the groups. +{{% /notice %}} -### Configure the Cluster Autoscaler (CA) We have provided a manifest file to deploy the CA. Copy the commands below into your Cloud9 Terminal. ``` @@ -22,41 +30,13 @@ curl -o ~/environment/cluster-autoscaler/cluster_autoscaler.yml https://raw.gith sed -i "s/--AWS_REGION--/${AWS_REGION}/g" ~/environment/cluster-autoscaler/cluster_autoscaler.yml ``` -### Configure the ASG -We will need to provide the names of the Autoscaling Groups that we want CA to manipulate. - -Your next task is to collect the names of the Auto Scaling Groups (ASGs) containing your Spot worker nodes. Record the names somewhere. We will use this later in the manifest file. - -You can find the names in the console by **[following this link](https://console.aws.amazon.com/ec2/autoscaling/home?#AutoScalingGroups:filter=eksctl-eksworkshop-eksctl-nodegroup-dev;view=details)**. - -![ASG](/images/using_ec2_spot_instances_with_eks/scaling/scaling-asg-spot-groups.png) - -You will need to save both ASG names for the next section. - -### Configure the Cluster Autoscaler - -Using the file browser on the left, open **cluster-autoscaler/cluster_autoscaler.yml** and amend the file: - - * Search for the block in the file containing this two lines. - ``` - - --nodes=0:5: - - --nodes=0:5: - ``` - - * Replace the content **** with the actual names of the two nodegroups. The following shows an example configuration. - ``` - - --nodes=0:5:eksctl-eksworkshop-eksctl-nodegroup-dev-4vcpu-16gb-spot-NodeGroup-1V6PX51MY0KGP - - --nodes=0:5:eksctl-eksworkshop-eksctl-nodegroup-dev-8vcpu-32gb-spot-NodeGroup-S0A0UGWAH5N1 - ``` - - * **Save** the file - -This command contains all of the configuration for the Cluster Autoscaler. Each `--nodes` entry defines a new Autoscaling Group mapping to a Cluster Autoscaler nodegroup. Cluster Autoscaler will consider the nodegroups selected when scaling the cluster. The syntax of the line is minimum nodes **(0)**, max nodes **(5)** and **ASG Name**. - - ### Deploy the Cluster Autoscaler -Cluster Autoscaler gets deploy like any other pod. In this case we will use the **[kube-system namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/)**, similar to what we do with other management pods. +{{% notice info %}} +You are encouraged to look at the configuration that you downloaded for cluster autoscaler in the directory `cluster-autoscaler` and find out about some of the parameter we are passing to it. The full list of parameters can be found in **[Cluster Autoscaler documentation](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-parameters-to-ca)**. +{{% /notice %}} + +Cluster Autoscaler gets deployed like any other pod. In this case we will use the **[kube-system namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/)**, similar to what we do with other management pods. ``` kubectl apply -f ~/environment/cluster-autoscaler/cluster_autoscaler.yml diff --git a/content/using_ec2_spot_instances_with_eks/spotworkers/deployhandler.md b/content/using_ec2_spot_instances_with_eks/spotworkers/deployhandler.md index 35089fa3..63116a58 100644 --- a/content/using_ec2_spot_instances_with_eks/spotworkers/deployhandler.md +++ b/content/using_ec2_spot_instances_with_eks/spotworkers/deployhandler.md @@ -21,25 +21,22 @@ Within the Node Termination Handler DaemonSet, the workflow can be summarized as * [**Drain**](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/) connections on the running pods. * Replace the pods on remaining nodes to maintain the desired capacity. -By default, **[aws-node-termination-handler](https://github.com/aws/aws-node-termination-handler)** will run on all of your nodes (on-demand and spot). If your spot instances are labeled, you can configure `aws-node-termination-handler` to only run on your labeled spot nodes. If you're using the tag `lifecycle=Ec2Spot`, you can run the following to apply our spot-node-selector overlay. +By default, **[aws-node-termination-handler](https://github.com/aws/aws-node-termination-handler)** will run on all of your nodes (on-demand and spot). +This also is our recommendation. Remember the termination handler does also handle maintenance events that can impact OnDemand instances! ``` helm repo add eks https://aws.github.io/eks-charts -helm install --name aws-node-termination-handler \ +helm install aws-node-termination-handler \ --namespace kube-system \ - --set nodeSelector.lifecycle=Ec2Spot \ eks/aws-node-termination-handler ``` -Verify that the pods are only running on node with label `lifecycle=Ec2Spot` +Verify that the pods are running on all nodes: ``` kubectl get daemonsets --all-namespaces ``` -Use **kube-ops-view** to confirm *AWS Node Termination Handler* DaemonSet has been deployed only to EC2 Spot nodes. +Use **kube-ops-view** to confirm *AWS Node Termination Handler* DaemonSet has been deployed to all nodes. -{{% notice warning %}} -Although in this workshop we deployed the *AWS Node Termination Handler* only to EC2 Spot nodes, our recommendation is to run the AWS Node Termination handler also on nodes where you would like to capture other termination events such as [maintenance events](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-instances-status-check_sched.html) or in the future Auto Scaling AZ Balancing events -{{% /notice %}} diff --git a/content/using_ec2_spot_instances_with_eks/spotworkers/selecting_instance_types.md b/content/using_ec2_spot_instances_with_eks/spotworkers/selecting_instance_types.md index ce76692a..6b3f61c3 100644 --- a/content/using_ec2_spot_instances_with_eks/spotworkers/selecting_instance_types.md +++ b/content/using_ec2_spot_instances_with_eks/spotworkers/selecting_instance_types.md @@ -26,26 +26,85 @@ We can diversify Spot instance pools using two strategies: - By Implementing instance diversification within the nodegroups, by selecting a mix of instances types and families from different Spot instance pools that meet the same vCPU's and memory criteria. -Our goal in this workshop, is to create at least 2 diversified groups of instances that adhere the 1vCPU:4GB RAM ratio. We can use **[Spot Instance Advisor](https://aws.amazon.com/ec2/spot/instance-advisor/)** page to find the relevant instances types and families with sufficient number of vCPUs and RAM, and use this to also select instance types with low interruption rates. - -{{% notice note %}} - The frequency of Spot Instance interruptions reflected in *Spot Instance Advisor* may change over time. Savings compared to On-Demand are calculated over the last 30 days. The above just provides a real world example from a specific time and will probably be different when you are performing this workshop. Note also that not all the instances are available in all the regions. -{{% /notice %}} - -![Selecting Instance Type with 4vCPU and 16GB](/images/using_ec2_spot_instances_with_eks/spotworkers/4cpu_16_ram_instances.png) - -In this case with Spot Instance Advisor we can create a 4vCPUs_16GB nodegroup with the following diversified instances: **m5.xlarge, m5d.xlarge, m4.xlarge, m5a.xlarge, t2.xlarge, t3.xlarge, t3a.xlarge** +Our goal in this workshop, is to create at least 2 diversified groups of instances that adhere the 1vCPU:4GB RAM ratio. + +We will use **[amazon-ec2-instance-selector](https://github.com/aws/amazon-ec2-instance-selector)** to help us select the relevant instance +types and familes with sufficient number of vCPUs and RAM. + +There are over 270 different instance types available on EC2 which can make the process of selecting appropriate instance types difficult. **[amazon-ec2-instance-selector](https://github.com/aws/amazon-ec2-instance-selector)** helps you select compatible instance types for your application to run on. The command line interface can be passed resource criteria like vcpus, memory, network performance, and much more and then return the available, matching instance types. + +Let's first install **amazon-ec2-instance-selector** : + +``` +curl -Lo ec2-instance-selector https://github.com/aws/amazon-ec2-instance-selector/releases/download/v1.3.0/ec2-instance-selector-`uname | tr '[:upper:]' '[:lower:]'`-amd64 && chmod +x ec2-instance-selector +sudo mv ec2-instance-selector /usr/local/bin/ +ec2-instance-selector --version +``` + +Now that you have ec2-instance-selector installed, you can run +`ec2-instance-selector --help` to understand how you could use it for selecting +instances that match your workload requirements. For the purpose of this workshop +we need to first get a group of instances that meet the 4vCPUs and 16GB of RAM. +Run the following command to get the list of instances. + +```bash +ec2-instance-selector --vcpus 4 --memory 16384 --gpus 0 --current-generation -a x86_64 --deny-list '.*n.*' +``` + +This should display a list like the one that follows (note results might differ depending on the region). We will use this instances as part of one of our node groups. + +``` +m4.xlarge +m5.xlarge +m5a.xlarge +m5d.xlarge +t2.xlarge +t3.xlarge +t3a.xlarge +``` + +Internally ec2-instance-selector is making calls to the [DescribeInstanceTypes](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeInstanceTypes.html) for the specific region and filtering +the intstances based on the criteria selected in the command line, in our case +we did filter for instances that meet the following criteria: + * Instances with no GPUs + * of x86_64 Architecture (no ARM instances like A1 or m6g instances for example) + * Instances that have 4 vCPUs and 16GB of Ram + * Instances of current generation (4th gen onwards) + * Instances that don't meet the regular expresion `.*n.*`, so effectively m5n, m5dn. {{% notice warning %}} Your workload may have other constraints that you should consider when selecting instances types. For example. **t2** and **t3** instance types are [burstable instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances.html) and might not be appropriate for CPU bound workloads that require CPU execution determinism. Instances such as m5**a** are [AMD Instances](https://aws.amazon.com/ec2/amd/), if your workload is sensitive to numerical differences (i.e: financial risk calculations, industrial simulations) mixing these instance types might not be appropriate. {{% /notice %}} +{{% notice note %}} +You are encouraged to test what are the options that `ec2-instance-selector` provides and run a few commands with it to familiarize yourself with the tool. +For example, try running the same commands as you did before with the extra parameter **`--output table-wide`**. +{{% /notice %}} + ### Challenge Find out another group that adheres to a 1vCPU:4GB ratio, this time using instances with 8vCPU's and 32GB of RAM. {{%expand "Expand this for an example on the list of instances" %}} -That should be easy : **m5.2xlarge, m5d.2xlarge, m4.2xlarge, m5a.2xlarge, t2.2xlarge, t3.2xlarge, t3a.2xlarge** + +That should be easy. You can run the command: + +```bash +ec2-instance-selector --vcpus 8 --memory 32768 --gpus 0 --current-generation -a x86_64 --deny-list '.*n.*|.*h.*' +``` + +which should yield a list as follows + +``` +m4.2xlarge +m5.2xlarge +m5a.2xlarge +m5d.2xlarge +t2.2xlarge +t3.2xlarge +t3a.2xlarge +``` + {{% /expand %}} diff --git a/content/using_ec2_spot_instances_with_eks/spotworkers/workers_eksctl.md b/content/using_ec2_spot_instances_with_eks/spotworkers/workers_eksctl.md index 96f15269..773a8c65 100644 --- a/content/using_ec2_spot_instances_with_eks/spotworkers/workers_eksctl.md +++ b/content/using_ec2_spot_instances_with_eks/spotworkers/workers_eksctl.md @@ -133,53 +133,3 @@ You can use the `kubectl describe nodes` with one of the spot nodes to see the t {{% notice note %}} Explore your cluster using kube-ops-view and find out the nodes that have just been created. {{% /notice %}} - - -### On-Demand and Spot mixed worker groups - -When deploying nodegroups, [eksctl](https://eksctl.io/usage/managing-nodegroups/) creates a CloudFormation template that deploys a [LaunchTemplate](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-ec2-launchtemplate.html) and an [Autoscaling Group](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-as-group.html) with the settings we provided in the configuration. Autoscaling groups using LaunchTemplate support not only mixed instance types but also purchasing options within the group. You can -mix **On-Demand, Reserved Instances, and Spot** within the same nodegroup. - -#### Label and Taint strategies on mixed workers - -The configuration we used creates two diversified instance groups with just Spot instances. We have attached to all nodes in both groups the same `lifecycle: Ec2Spot` Label and a `spotInstance: "true:PreferNoSchedule"` taint. When using a mix of On-Demand and Spot instances within the same nodegroup, we need to implement conditional logic on the back of the instance attribute **InstanceLifecycle** and set the labels and taints accordingly. - -{{% notice warning %}} -Note that for [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-key-best-practices-for-running-cluster-autoscaler) all nodes within the same node group should have the same capacity and labels, for it to predict which nodegroup to increase the capacity on. -{{% /notice %}} - - -This can be achieved in multiple ways by extending the bootstrapping sequence. - - * **[eksctl_mixed_workers_bootstrap.yml](spotworkers.files/eksctl_mixed_workers_bootstrap.yml)** Provides an example file for [overriding eksctl boostrap process](https://github.com/weaveworks/eksctl/issues/929) in eksctl nodegroups. Note you may need to change the region details when using this example. - - -* **[cloudformation_mixed_workers.yml](spotworkers.files/cloudformation_mixed_workers.yml)** Provides a cloudformation template -to set up an autoscaling group with mixed on-demand and spot workers and insert bootstrap parameters to each depending on the node "InstanceLifecycle". - - -### Optional Exercise - -{{% notice warning %}} -It will take time to provision and decommission capacity. If you are running this -workshop at a AWS event or with limited time, we recommend to come back to this section once you have -completed the workshop, and before getting into the **cleanup** section. -{{% /notice %}} - - * Delete the current configuration and instead create 2 nodegroups one with 4vCPU's and 16GB ram and another one with 8vCPU's and 32GB of ram. The nodegroups must implement a set of mixed instances balanced at 50% between on-demand and spot. On-Demand instances must have a label `lifecycle: OnDemand`. Spot instances must have a label `lifecycle: Ec2Spot` and a taint `spotInstance: "true:PreferNoSchedule"` - -{{%expand "Show me a hint for implementing this." %}} -You can delete the previous nodegroup created using - -```bash -eksctl delete nodegroup -f spot_nodegroups.yml -``` - -Download the example file [eksctl_mixed_workers_bootstrap.yml](spotworkers.files/eksctl_mixed_workers_bootstrap.yml), change the region to the current one where -your cluster is running and create the nodegroups using the following command: - -```bash -eksctl create nodegroup -f eksctl_mixed_workers_bootstrap.yml -``` -{{% /expand %}} - diff --git a/static/images/using_ec2_spot_instances_with_eks/spotworkers/eks_spot_diagram.png b/static/images/using_ec2_spot_instances_with_eks/spotworkers/eks_spot_diagram.png index 3040d23c..a6669a11 100644 Binary files a/static/images/using_ec2_spot_instances_with_eks/spotworkers/eks_spot_diagram.png and b/static/images/using_ec2_spot_instances_with_eks/spotworkers/eks_spot_diagram.png differ