Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EMR Allocation Strategies #7

Merged
merged 3 commits into from
Jan 7, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,18 @@ With EMR instance fleets, you specify target capacities for On-Demand Instances
[Click here] (https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-instance-fleet.html) to learn more about EMR Instance Fleets in the official documentation.
{{% /notice %}}

**When Amazon EMR launches the cluster, it looks across those subnets to find the instances and purchasing options you specify, and will select the Spot Instances with the lowest chance of getting interrupted, for the lowest cost.**


While a cluster is running, if Amazon EC2 reclaims a Spot Instance or if an instance fails, Amazon EMR tries to replace the instance with any of the instance types that you specify in your fleet. This makes it easier to regain capacity in case some of the instances get interrupted by EC2 when it needs the Spot capacity back.\
These options do not exist within the default EMR configuration option "Uniform Instance Groups", hence we will be using EMR Instance Fleets only.

As an enhancement to the default EMR instance fleets cluster configuration, the allocation strategy feature is available in EMR version **5.12.1 and later**. With allocation strategy -/
* On-Demand instances use a lowest-price strategy, which launches the lowest-priced instances first./
* Spot instances use a [capacity-optimized] (https://aws.amazon.com/about-aws/whats-new/2020/06/amazon-emr-uses-real-time-capacity-insights-to-provision-spot-instances-to-lower-cost-and-interruption/) allocation strategy, which allocates instances from most-available Spot Instance pools and lowers the chance of interruptions. This allocation strategy is appropriate for workloads that have a higher cost of interruption such as persistent EMR clusters running Apache Spark, Apache Hive, and Presto.

{{% notice note %}}
This allocation strategy option also lets you specify **up to 15 EC2 instance types on task instance fleet**. By default, Amazon EMR allows a maximum of 5 instance types for each type of instance fleet. By enabling allocation strategy, you can diversify your Spot request for task instance fleet across 15 instance pools. With more instance type diversification, Amazon EMR has more capacity pools to allocate capacity from, this allows you to get more compute capacity.
{{% /notice %}}


{{% notice info %}}
[Click here] (https://aws.amazon.com/blogs/big-data/optimizing-amazon-emr-for-resilience-and-cost-with-capacity-optimized-spot-instances/) for an in-depth blog post about capacity-optimized allocation strategy for Amazon EMR instance fleets.
{{% /notice %}}
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Under the core node type, click **Add / remove instance types to fleet** and sel

#### **Task Instance Fleet**:
Our task nodes will only run Spark executors and no HDFS DataNodes, so this is a great fit for scaling out and increasing the parallelization of our application's execution, to achieve faster execution times.
Under the task node type, Click **Add / remove instance types to fleet** and select up to 15 instance types you noted before as suitable for our executor size.\
Under the task node type, click **Add / remove instance types to fleet** and select **up to 15 instance types** you noted before as suitable for your executor size.\
Since our executor size is 4 vCPUs, and each instance counts as the number of its vCPUs towards the total units, let's specify **32 Spot units** in order to run 8 executors, and allow EMR to select the best instance type in the Task Instance Fleet to run the executors on.

![FleetSelection3](/images/running-emr-spark-apps-on-spot/emrinstancefleets-task2.png)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ r5.xlarge: yarn.scheduler.maximum-allocation-mb 24576\
2. With the Spark on YARN configuration option which was [introduced in EMR version 5.22] (https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-whatsnew-history.html#emr-5220-whatsnew): spark.yarn.executor.memoryOverheadFactor and defaults to 0.1875 (18.75% of the spark.yarn.executor.memoryOverhead setting )


So we can conclude that if we decrease our executor size to ~18GB, we'll be able to use r4.xlarge and basically any of the R family instance types as vCPU and Memory grows linearly within family sizes. If EMR will select an r4.2xlarge instance type from the list of supported instance types that we'll provide to EMR Instance Fleets, then it will run more than 1 executor on each instance, due to Spark dynamic allocation being enabled by default.
So we can conclude that if we decrease our executor size to ~18GB, we'll be able to use r4.xlarge and basically any of the R family instance types (`1:8 vCPU:Mem ratio`) as vCPU and Memory grows linearly within family sizes. If EMR will select an r4.2xlarge instance type from the list of supported instance types that we'll provide to EMR Instance Fleets, then it will run more than 1 executor on each instance, due to Spark dynamic allocation being enabled by default.

![tags](/images/running-emr-spark-apps-on-spot/sparkmemory.png)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,18 @@ title: "Selecting instance types"
weight: 50
---

Let's use our newly acquired knowledge around Spark executor sizing in order to select the EC2 Instance Types that will be used in our EMR cluster.\
EMR clusters run Master, Core and Task node types. [Click here] (https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-master-core-task-nodes.html) to read more about the different node types.
Let's use our newly acquired knowledge around Spark executor sizing in order to select the EC2 Instance Types that will be used in our EMR cluster. We determined that in order to be flexible and allow running on multiple instance types, we will submit our Spark application with **"–executor-memory=18GB –executor-cores=4"**.

We determined that in order to be flexible and allow running on multiple instance types across R instance family, we will submit our Spark application with **"–executor-memory=18GB –executor-cores=4"**.

We will use **[amazon-ec2-instance-selector](https://github.com/aws/amazon-ec2-instance-selector)** to help us select the relevant instance
types and families with sufficient number of vCPUs and RAM.
For example: We identified R family instances, so EMR can run executors that will consume 4 vCPUs and 18 GB of RAM and still leave free RAM for the operating system and other processes. First, we can select different-sized instance types from current generation, such as r5.xlarge, r5.2xlarge and r5.4xlarge. Next, we can select different-sized instance types from previous generation, such as r4.xlarge, r4.2xlarge and r4.4xlarge. Last, we can select different-sized instances from R family local storage and processor variants, such as R5d instance types (local NVMe-based SSDs) and R5a instance types (powered by AMD processors).
To apply the instance diversification best practices while meeting the application constraints defined in the previous section, we can add different instances sizes from the current generation, such as R5 and R4. We can even include variants, such as R5d instance types (local NVMe-based SSDs) and R5a instance types (powered by AMD processors).

{{% notice info %}}
There are over 275 different instance types available on EC2 which can make the process of selecting appropriate instance types difficult. **[amazon-ec2-instance-selector](https://github.com/aws/amazon-ec2-instance-selector)** helps you select compatible instance types for your application to run on. The command line interface can be passed resource criteria like vCPUs, memory, network performance, and much more and then return the available, matching instance types.
{{% /notice %}}

We will use **amazon-ec2-instance-selector** to help us select the relevant instance
types with sufficient number of vCPUs and RAM.

Let's first install **amazon-ec2-instance-selector** on Cloud9 IDE:
Let's first install amazon-ec2-instance-selector on Cloud9 IDE:

```
curl -Lo ec2-instance-selector https://github.com/aws/amazon-ec2-instance-selector/releases/download/v1.3.0/ec2-instance-selector-`uname | tr '[:upper:]' '[:lower:]'`-amd64 && chmod +x ec2-instance-selector
Expand All @@ -23,16 +23,28 @@ ec2-instance-selector --version
```

Now that you have ec2-instance-selector installed, you can run
`ec2-instance-selector --help` to understand how you could use it for selecting
instances that match your workload requirements. For the purpose of this workshop
we need to first get a group of instances with sizes between 4vCPU to 16vCPUs and belong to R5, R4, R5D and R5A instance types.
Run the following command to get the list of instances.
`ec2-instance-selector --help`, to understand how you could use it for selecting
instances that match your workload requirements.

For the purpose of this workshop we will select instances based on below criteria -\
* Instances that have minimum 4 vCPUs and maximum 16 vCPUs\
* Instances which have vCPU to Memory ratio of 1:8, same as R Instance family\
* Instances with CPU Architecture x86_64 and no GPU Instances\
* Instances that belong to current generation\
* Instances types that are not supported by EMR such as R5N, R5ad and R5b. Enhanced z, I and D Instance families, which are priced higher than R family. So basically, adding a deny list with the regular expression `.*n.*|.*ad.*|.*b.*|^[zid].*`.

{{% notice info %}}
[Click here] (https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-supported-instance-types.html) to find out the instance types that Amazon EMR supports .
{{% /notice %}}

Run the following command with above mentioned criteria, to get the list of instances.

```bash
ec2-instance-selector --vcpus-min 4 --vcpus-max 16 --allow-list '.*r5.*|.*r4.*|.*r5d.*|.*r5a.*' --deny-list '.*n.*|.*ad.*|.*b.*'
ec2-instance-selector --vcpus-min 4 --vcpus-max 16 --vcpus-to-memory-ratio 1:8 --cpu-architecture x86_64 --current-generation --gpus 0 --deny-list '.*n.*|.*ad.*|.*b.*|^[zid].*'
```

This should display a list like the one that follows (note results might differ depending on the region). We will use this instances as part of our EMR Core and Task Instance Fleets.
Internally ec2-instance-selector is making calls to the [DescribeInstanceTypes](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeInstanceTypes.html) for the specific region and filtering
the instances based on the criteria selected in the command line. Above command should display a list like the one that follows (note results might differ depending on the region). We will use this instances as part of our EMR Core and Task Instance Fleets.

```
r4.2xlarge
Expand All @@ -46,17 +58,9 @@ r5a.4xlarge
r5a.xlarge
r5d.2xlarge
r5d.4xlarge
r5d.xlarge
r5d.xlarge
```

Internally ec2-instance-selector is making calls to the [DescribeInstanceTypes](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeInstanceTypes.html) for the specific region and filtering
the instances based on the criteria selected in the command line, in our case
we did filter for instances that meet the following criteria:\
* Instances that have minimum 4 vCPUs and maximum 16 vCPUs\
* Instances of R5, R4, R5D and R5A generation\
* Instances that don't meet the regular expression `.*n.*|.*ad.*|.*b.*`, so effectively r5n, r5dn, r5ad and r5b.


{{% notice note %}}
You are encouraged to test what are the options that `ec2-instance-selector` provides and run a few commands with it to familiarize yourself with the tool.
For example, try running the same commands as you did before with the extra parameter **`--output table-wide`**.
Expand Down