forked from awslabs/ec2-spot-workshops
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #6 from jagpk/EMR-AllocationStrategies
EMR Allocation Strategies with Instance Selector
- Loading branch information
Showing
17 changed files
with
148 additions
and
49 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
28 changes: 28 additions & 0 deletions
28
content/running_spark_apps_with_emr_on_spot_instances/cloud9-awscli.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
--- | ||
title: "Update to the latest AWS CLI" | ||
chapter: false | ||
weight: 20 | ||
comment: default install now includes aws-cli/1.15.83 | ||
--- | ||
|
||
{{% notice tip %}} | ||
For this workshop, please ignore warnings about the version of pip being used. | ||
{{% /notice %}} | ||
|
||
1. Run the following command to view the current version of aws-cli: | ||
``` | ||
aws --version | ||
``` | ||
|
||
1. Update to the latest version: | ||
``` | ||
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" | ||
unzip awscliv2.zip | ||
sudo ./aws/install | ||
. ~/.bash_profile | ||
``` | ||
|
||
1. Confirm you have a newer version: | ||
``` | ||
aws --version | ||
``` |
35 changes: 35 additions & 0 deletions
35
content/running_spark_apps_with_emr_on_spot_instances/cloud9-workspace.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
--- | ||
title: "Create a Workspace" | ||
chapter: false | ||
weight: 15 | ||
--- | ||
|
||
{{% notice warning %}} | ||
If you are running the workshop on your own, the Cloud9 workspace should be built by an IAM user with Administrator privileges, not the root account user. Please ensure you are logged in as an IAM user, not the root | ||
account user. | ||
{{% /notice %}} | ||
|
||
{{% notice info %}} | ||
If you are at an AWS hosted event, follow the instructions on the region that should be used to launch resources | ||
{{% /notice %}} | ||
|
||
{{% notice tip %}} | ||
Ad blockers, javascript disablers, and tracking blockers should be disabled for | ||
the cloud9 domain, or connecting to the workspace might be impacted. | ||
Cloud9 requires third-party-cookies. You can whitelist the [specific domains]( https://docs.aws.amazon.com/cloud9/latest/user-guide/troubleshooting.html#troubleshooting-env-loading). | ||
{{% /notice %}} | ||
|
||
### Launch Cloud9: | ||
|
||
- Go to [Cloud9 Console](https://console.aws.amazon.com/cloud9/home) | ||
- Select **Create environment** | ||
- Name it **emrworkshop**, and take all other defaults | ||
- When it comes up, customize the environment by closing the **welcome tab** | ||
and **lower work area**, and opening a new **terminal** tab in the main work area: | ||
![c9before](/images/running-emr-spark-apps-on-spot/c9before.png) | ||
|
||
- Your workspace should now look like this: | ||
![c9after](/images/running-emr-spark-apps-on-spot/c9after.png) | ||
|
||
- If you like this theme, you can choose it yourself by selecting **View / Themes / Solarized / Solarized Dark** | ||
in the Cloud9 workspace menu. |
19 changes: 19 additions & 0 deletions
19
content/running_spark_apps_with_emr_on_spot_instances/emr_allocation_strategies.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
--- | ||
title: "EMR Allocation Strategies" | ||
weight: 35 | ||
--- | ||
|
||
As an enhancement to the default EMR instance fleets cluster configuration, the allocation strategy feature is available in EMR version 5.12.1 and later. It optimizes the allocation of instance fleet capacity and lets you choose a target strategy for each cluster node. | ||
|
||
* On-Demand instances use a lowest-price strategy, which launches the lowest-priced instances first. | ||
* Spot instances use a capacity-optimized strategy, which launches Spot instances from Spot instance pools that have optimal capacity for the number of instances that are launching. | ||
|
||
{{% notice note %}} | ||
The allocation strategy option also lets you specify up to 15 EC2 instance types per task node when creating your cluster, as opposed to 5 maximum allowed by the default EMR cluster instance fleet configuration. | ||
{{% /notice %}} | ||
|
||
The capacity-optimized allocation strategy for Spot instances uses real-time capacity data to allocate instances from the Spot instance pools with the optimal capacity for the number of instances that are launching. This allocation strategy is appropriate for workloads that have a higher cost of interruption. Examples include long-running jobs and multi-tenant persistent clusters running Apache Spark, Apache Hive, and Presto. This allocation strategy lets you specify up to 15 EC2 instance types on task instance fleets to diversify your Spot requests and get steep discounts. Previously, instance fleets allowed a maximum of five instance types. You can now diversify your Spot requests across these 15 pools within each Availability Zone and prioritize deploying into a deeper capacity pool to lower the chance of interruptions. With more instance type diversification, Amazon EMR has more capacity pools to allocate capacity from, and chooses the Spot Instances which are least likely to be interrupted. | ||
|
||
{{% notice info %}} | ||
[Click here] (https://aws.amazon.com/blogs/big-data/optimizing-amazon-emr-for-resilience-and-cost-with-capacity-optimized-spot-instances/) For an in-depth blog post about capacity-optimized allocation strategy for Amazon EMR instance fleets. | ||
{{% /notice %}} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.