Skip to content

Commit

Permalink
Merge branch 'awslabs:master' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
bpguasch authored Aug 1, 2022
2 parents 34a6c54 + 80b1f3c commit a6239ee
Show file tree
Hide file tree
Showing 18 changed files with 181 additions and 118 deletions.
2 changes: 1 addition & 1 deletion content/karpenter/040_karpenter/advanced_provisioner.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ spec:
...
```

Launch Teplates specifies instance configuration information. It includes the ID of the Amazon Machine Image (AMI), the instance type, a key pair, storage, user data and other parameters used to launch EC2 instances. Launch Template user data can be used to customize the node bootstrapping to the cluster. In the default configuration, Karpenter uses an EKS optimized version of AL2 and passes the hostname of the Kubernetes API server, and a certificate for the node to bootstrap the process with the default configuration. The EKS Optimized AMI includes a `bootstrap.sh` script which connects the instance to the cluster, based on the passed data. Alternatively, you may reference AWS's [`bootstrap.sh`
Launch Templates specifies instance configuration information. It includes the ID of the Amazon Machine Image (AMI), the instance type, a key pair, storage, user data and other parameters used to launch EC2 instances. Launch Template user data can be used to customize the node bootstrapping to the cluster. In the default configuration, Karpenter uses an EKS optimized version of AL2 and passes the hostname of the Kubernetes API server, and a certificate for the node to bootstrap the process with the default configuration. The EKS Optimized AMI includes a `bootstrap.sh` script which connects the instance to the cluster, based on the passed data. Alternatively, you may reference AWS's [`bootstrap.sh`
file](https://github.com/awslabs/amazon-eks-ami/blob/master/files/bootstrap.sh)
when building a custom base image.

Expand Down
2 changes: 1 addition & 1 deletion content/karpenter/040_karpenter/multiple_architectures.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "Multi-Archtiecture deployments"
title: "Multi-Architecture deployments"
date: 2021-11-07T11:05:19-07:00
weight: 70
draft: false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,20 @@
title: "EMR Instance Fleets"
weight: 30
---
The **[instance fleet](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-instance-fleet.html)** configuration for Amazon EMR clusters lets you select a wide variety of provisioning options for Amazon EC2 instances, and helps you develop a flexible and elastic resourcing strategy for each node type in your cluster. You can have only one instance fleet per master, core, and task node type. In an instance fleet configuration, you specify a *target capacity* for **[On-Demand Instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-on-demand-instances.html)** and **[Spot Instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances.html)** within each fleet.

When adopting Spot Instances into your workload, it is recommended to be flexible around how to launch your workload in terms of Availability Zone and Instance Types. This is in order to be able to achieve the required scale from multiple Spot capacity pools (a combination of EC2 instance type in an availability zone) or one capacity pool which has sufficient capacity, as well as decrease the impact on your workload in case some of the Spot capacity is interrupted with a 2-minute notice when EC2 needs the capacity back, and allow EMR to replenish the capacity with a different instance type.
When adopting Spot Instances into your workload, it is recommended to be flexible with selection of instance types across family, generation, sizes, and Availability Zones. With higher instance type diversification, Amazon EMR has more capacity pools to allocate capacity from, and chooses the Spot Instances which are least likely to be interrupted.

With EMR instance fleets, you specify target capacities for On-Demand Instances and Spot Instances within each fleet (Master, Core, Task). When the cluster launches, Amazon EMR provisions instances until the targets are fulfilled. You can specify up to five EC2 instance types per fleet for Amazon EMR to use when fulfilling the targets. You can also select multiple subnets for different Availability Zones.

{{% notice info %}}
**[Click here](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-instance-fleet.html)** to learn more about EMR Instance Fleets in the official documentation.
{{% notice note %}}
EMR allows a maximum of five instance types per fleet, when you use use the default Amazon EMR cluster instance fleet configuration. However, you can specify a maximum of 15 instance types per fleet when you create a cluster using the EMR console and enable **[allocation strategy](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-instance-fleet.html#emr-instance-fleet-allocation-strategy)** option. This limit is further increased to 30 when you create a cluster using AWS CLI or Amazon EMR API and enable allocation strategy option.
{{% /notice %}}

While a cluster is running, if Amazon EC2 reclaims a Spot Instance or if an instance fails, Amazon EMR tries to replace the instance with any of the instance types that you specify in your fleet. This makes it easier to regain capacity in case some of the instances get interrupted by EC2 when it needs the Spot capacity back.
As an enhancement to the default EMR instance fleets cluster configuration, the allocation strategy feature is available in EMR version **5.12.1 and later**. With allocation strategy EMR instance fleet with:

These options do not exist within the default EMR configuration option "Uniform Instance Groups", hence we will be using EMR Instance Fleets only.
* On-Demand Instances uses a lowest-priced strategy, which launches the lowest-priced On-Demand Instances first.
* Spot Instances uses a **[capacity-optimized](https://aws.amazon.com/about-aws/whats-new/2020/06/amazon-emr-uses-real-time-capacity-insights-to-provision-spot-instances-to-lower-cost-and-interruption/)** allocation strategy, which allocates instances from most-available Spot Instance pools and lowers the chance of further interruptions. This allocation strategy is appropriate for workloads that have a higher cost of interruption such as persistent EMR clusters running Apache Spark, Apache Hive, and Presto.

As an enhancement to the default EMR instance fleets cluster configuration, the allocation strategy feature is available in EMR version **5.12.1 and later**. With allocation strategy:
* On-Demand instances use a lowest-price strategy, which launches the lowest-priced instances first.
* Spot instances use a **[capacity-optimized](https://aws.amazon.com/about-aws/whats-new/2020/06/amazon-emr-uses-real-time-capacity-insights-to-provision-spot-instances-to-lower-cost-and-interruption/)** allocation strategy, which allocates instances from most-available Spot Instance pools and lowers the chance of interruptions. This allocation strategy is appropriate for workloads that have a higher cost of interruption such as persistent EMR clusters running Apache Spark, Apache Hive, and Presto.

{{% notice note %}}
This allocation strategy option also lets you specify **up to 15 EC2 instance types on task instance fleet**. By default, Amazon EMR allows a maximum of 5 instance types for each type of instance fleet. By enabling allocation strategy, you can diversify your Spot request for task instance fleet across 15 instance pools. With more instance type diversification, Amazon EMR has more capacity pools to allocate capacity from, this allows you to get more compute capacity.
{{% /notice %}}
These options do not exist within the default EMR configuration option uniform instance groups, hence we recommend using instance fleets with Spot Instances.

{{% notice info %}}
**[Click here](https://aws.amazon.com/blogs/big-data/optimizing-amazon-emr-for-resilience-and-cost-with-capacity-optimized-spot-instances/)** for an in-depth blog post about capacity-optimized allocation strategy for Amazon EMR instance fleets.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Examining the cluster"
weight: 95
weight: 90
---

In this section we will look at the utilization of our EC2 Spot Instances while the application is running, and examine how many Spark executors are running.
Expand Down Expand Up @@ -78,21 +78,30 @@ Now, let's look at **Spark History Server** application user interface:
### Using CloudWatch Metrics
EMR emits several useful metrics to CloudWatch metrics. You can use the AWS Management Console to look at the metrics in two ways:
1. In the EMR console, under the Monitoring tab in your Cluster's page
1. In the EMR console, under the **Monitoring** tab in your cluster's page
2. By browsing to the CloudWatch service, and under Metrics, searching for the name of your cluster (copy it from the EMR Management Console) and clicking **EMR > Job Flow Metrics**
{{% notice note %}}
The metrics will take a few minutes to populate.
{{% /notice %}}
Some notable metrics:
Some notable metrics:
* **AppsRunning** - you should see 1 since we only submitted one step to the cluster.
* **ContainerAllocated** - this represents the number of containers that are running on your cluster, on the Core and Task Instance Fleets. These would the be Spark executors and the Spark Driver.
* **MemoryAllocatedMB** & **MemoryAvailableMB** - you can graph them both to see how much memory the cluster is actually consuming for the wordcount Spark application out of the memory that the instances have.
* **ContainerAllocated** - this represents the number of containers that are running on core and task fleets. These would the be Spark executors and the Spark Driver.
* **Memory allocated MB** & **Memory available MB** - you can graph them both to see how much memory the cluster is actually consuming for the wordcount Spark application out of the memory that the instances have.
#### Managed Scaling in Action
You enabled managed cluster scaling and EMR scaled out to 64 Spot units in the task fleet. EMR could have launched either 16 * xlarge (running one executor per xlarge) or 8 * 2xlarge instances (running 2 executors per 2xlarge) or 4 * 4xlarge instances (running 4 executors pe r4xlarge), so the task fleet provides 16 executors / containers to the cluster. The core fleet launched one xlarge instance and it will run one executor / container, so in total 17 executors / containers will be running in the cluster.
1. In your EMR cluster page, in the AWS Management Console, go to the **Steps** tab.
1. Go to the **Events** tab to see the scaling events.
![scalingEvent](/images/running-emr-spark-apps-on-spot/emrsparkscalingevent.png)
EMR Managed cluster scaling constantly monitors [key metrics](https://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-metrics.html) and automatically increases or decreases the number of instances or units in your cluster based on workload.
#### Number of executors in the cluster
With 32 Spot Units in the Task Instance Fleet, EMR launched either 8 * xlarge (running one executor) or 4 * 2xlarge instances (running 2 executors) or 2 * 4xlarge instances (running 4 executors), so the Task Instance Fleet provides 8 executors / containers to the cluster.
The Core Instance Fleet launched one xlarge instance, able to run one executor.
{{%expand "Question: Did you see more than 9 containers in CloudWatch Metrics and in YARN ResourceManager? if so, do you know why? Click to expand the answer" %}}
{{%expand "Question: Did you see more than 17 containers in CloudWatch Metrics and in YARN ResourceManager? if so, do you know why? Click to expand the answer" %}}
Your Spark application was configured to run in Cluster mode, meaning that the **Spark driver is running on the Core node**. Since it is counted as a container, this adds a container to our count, but it is not an executor.
{{% /expand%}}
{{% /expand%}}
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,17 @@ title: "Fleet configuration options"
weight: 85
---

While our cluster is starting (7-8 minutes) and the step is running (4-10 minutes depending on the instance types that were selected) let's take the time to look at some of the EMR Instance Fleets configurations we didn't dive into when starting the cluster.
While our cluster is starting (7-8 minutes) and the step is running (4-10 minutes depending on the instance types that were selected) let's take the time to look at some of the EMR instance fleets configurations we didn't dive into when starting the cluster.

![fleetconfigs](/images/running-emr-spark-apps-on-spot/emrinstancefleets-core1.png)
![fleetconfigs](/images/running-emr-spark-apps-on-spot/emrinstancefleets-core.png)

#### Maximum Spot price
Since Nov 2017, Amazon EC2 Spot Instances [changed the pricing model and bidding was eliminated](https://aws.amazon.com/blogs/compute/new-amazon-ec2-spot-pricing/). We have an optional "Max-price" field for our Spot requests, which would limit how much we're willing to pay for the instance. It is recommended to leave this value at 100% of the On-Demand price, in order to avoid limiting our instance diversification. We are going to pay the Spot market price regardless of the Maximum price that we can specify, and setting a higher max price does not increase the chance of getting Spot capacity nor does it decrease the chance of getting your Spot Instances interrupted when EC2 needs the capacity back. You can see the current Spot price in the AWS Management Console under EC2 -> Spot Requests -> **Pricing History**.
Since Nov 2017, Amazon EC2 Spot Instances [changed the pricing model and bidding was eliminated](https://aws.amazon.com/blogs/compute/new-amazon-ec2-spot-pricing/). We have an optional "Max-price" field for our Spot requests, which would limit how much we're willing to pay for the instance. It is recommended to leave this value at 100% of the On-Demand price, in order to avoid limiting our instance diversification. We are going to pay the Spot market price regardless of the Maximum price that we can specify, and setting a higher max price does not increase the chance of getting Spot capacity nor does it decrease the chance of getting your Spot Instances interrupted when EC2 needs the capacity back. You can see the current Spot price in the AWS Management Console under **EC2 >> Spot Requests >> Pricing History**.

#### Each instance counts as X units
This configuration allows us to give each instance type in our diversified fleet a weight that will count towards our Total units. By default, this weight is configured as the number of YARN VCores that the instance type has by default (this would typically equate to the number of EC2 vCPUs) - this way it's easy to set the Total units to the number of vCPUs we want our cluster to run with, and EMR will select the best instances while taking into account the required number of instances to run. For example, if r4.xlarge is the instance type that EMR found to be the least likely to be interrupted, its weight is 4 and our total units (only Spot) is 32, then 8 * r4.xlarge instances will be launched by EMR in the fleet.
If my Spark application is memory driven, I can set the total units to the total amount of memory I want my cluster to run with, and change the "Each instance counts as" field to the total memory of the instance, leaving aside some memory for the operating system and other processes. For example, for the r4.xlarge I can set its weight to 25. If I then set up the Total units to 500 then EMR will bring up 20 * r4.xlarge instances in the fleet. Since our executor size is 18 GB, one executor will run on this instance type.
This configuration allows us to give each instance type in our diversified fleet a weight that will count towards our **Target capacity**. By default, this weight is configured as the number of YARN VCores that the instance type has by default, this would typically equate to the number of EC2 vCPUs. For example, r4.xlarge has a default weight set to 4 units and you have set the **Target capacity** for task fleet to 32. If EMR picks r4.xlarge as the most available Spot Instance, then 8 * r4.xlarge instances will be launched by EMR in the task fleet.

If your Spark application is memory driven, you can set the **Target capacity** to the total amount of memory you want the cluster to run with. You can change the **Each instance counts as** field to the total memory of the instance, leaving aside some memory for the operating system and other processes. For example, for the r4.xlarge you can set **Each instance counts as** 18 units and set **Target capacity** to 144. If EMR picks r4.xlarge as the most available Spot Instance, then 8 * r4.xlarge instances will be launched by EMR in the task fleet. Since your executor size is 18 GB, one executor will run on each r4.xlarge instance.

#### Provisioning timeout
You can determine that after a set amount of minutes, if EMR is unable to provision your selected Spot Instances due to lack of capacity, it will either start On-Demand instances instead, or terminate the cluster. This can be determined according to the business definition of the cluster or Spark application - if it is SLA bound and should complete even at On-Demand price, then the "Switch to On-Demand" option might be suitable. However, make sure you diversify the instance types in the fleet when looking to use Spot Instances, before you look into failing over to On-Demand.
Loading

0 comments on commit a6239ee

Please sign in to comment.