diff --git a/content/running_spark_apps_with_emr_on_spot_instances/cleanup_ownaccount.md b/content/running_spark_apps_with_emr_on_spot_instances/cleanup_ownaccount.md index fc2e5be5..cd9e56a1 100644 --- a/content/running_spark_apps_with_emr_on_spot_instances/cleanup_ownaccount.md +++ b/content/running_spark_apps_with_emr_on_spot_instances/cleanup_ownaccount.md @@ -6,8 +6,7 @@ hidden: true --- 1. In the EMR Management Console, check that the cluster is in the **Terminated** state. If it isn't, then you can terminate it from the console. -2. Delete the VPC you deployed via CloudFormation, by going to the CloudFormation service in the AWS Management Console, selecting the VPC stack (default name is Quick-Start-VPC) and click the Delete option. Make sure that the deletion has completed successfully (this should take around 1 minute), the status of the stack will be DELETE_COMPLETE (the stack will move to the Deleted list of stacks). -3. Delete your S3 bucket from the AWS Management Console - choose the bucket from the list of buckets and hit the Delete button. This approach will also empty the bucket and delete all existing objects in the bucket. -4. Delete the Athena table by going to the Athena service in the AWS Management Console, find the **emrworkshopresults** Athena table, click the three dots icon next to the table and select **Delete table**. - - +2. Go to the [Cloud9 Dashboard](https://console.aws.amazon.com/cloud9/home) and delete your environment. +3. Delete the VPC you deployed via CloudFormation, by going to the CloudFormation service in the AWS Management Console, selecting the VPC stack (default name is Quick-Start-VPC) and click the Delete option. Make sure that the deletion has completed successfully (this should take around 1 minute), the status of the stack will be DELETE_COMPLETE (the stack will move to the Deleted list of stacks). +4. Delete your S3 bucket from the AWS Management Console - choose the bucket from the list of buckets and hit the Delete button. This approach will also empty the bucket and delete all existing objects in the bucket. +5. Delete the Athena table by going to the Athena service in the AWS Management Console, find the **emrworkshopresults** Athena table, click the three dots icon next to the table and select **Delete table**. \ No newline at end of file diff --git a/content/running_spark_apps_with_emr_on_spot_instances/cloud9-awscli.md b/content/running_spark_apps_with_emr_on_spot_instances/cloud9-awscli.md new file mode 100644 index 00000000..95d0eafb --- /dev/null +++ b/content/running_spark_apps_with_emr_on_spot_instances/cloud9-awscli.md @@ -0,0 +1,28 @@ +--- +title: "Update to the latest AWS CLI" +chapter: false +weight: 20 +comment: default install now includes aws-cli/1.15.83 +--- + +{{% notice tip %}} +For this workshop, please ignore warnings about the version of pip being used. +{{% /notice %}} + +1. Run the following command to view the current version of aws-cli: +``` +aws --version +``` + +1. Update to the latest version: +``` +curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" +unzip awscliv2.zip +sudo ./aws/install +. ~/.bash_profile +``` + +1. Confirm you have a newer version: +``` +aws --version +``` diff --git a/content/running_spark_apps_with_emr_on_spot_instances/cloud9-workspace.md b/content/running_spark_apps_with_emr_on_spot_instances/cloud9-workspace.md new file mode 100644 index 00000000..6150fd29 --- /dev/null +++ b/content/running_spark_apps_with_emr_on_spot_instances/cloud9-workspace.md @@ -0,0 +1,35 @@ +--- +title: "Create a Workspace" +chapter: false +weight: 15 +--- + +{{% notice warning %}} +If you are running the workshop on your own, the Cloud9 workspace should be built by an IAM user with Administrator privileges, not the root account user. Please ensure you are logged in as an IAM user, not the root +account user. +{{% /notice %}} + +{{% notice info %}} +If you are at an AWS hosted event, follow the instructions on the region that should be used to launch resources +{{% /notice %}} + +{{% notice tip %}} +Ad blockers, javascript disablers, and tracking blockers should be disabled for +the cloud9 domain, or connecting to the workspace might be impacted. +Cloud9 requires third-party-cookies. You can whitelist the [specific domains]( https://docs.aws.amazon.com/cloud9/latest/user-guide/troubleshooting.html#troubleshooting-env-loading). +{{% /notice %}} + +### Launch Cloud9: + +- Go to [Cloud9 Console](https://console.aws.amazon.com/cloud9/home) +- Select **Create environment** +- Name it **emrworkshop**, and take all other defaults +- When it comes up, customize the environment by closing the **welcome tab** +and **lower work area**, and opening a new **terminal** tab in the main work area: +![c9before](/images/running-emr-spark-apps-on-spot/c9before.png) + +- Your workspace should now look like this: +![c9after](/images/running-emr-spark-apps-on-spot/c9after.png) + +- If you like this theme, you can choose it yourself by selecting **View / Themes / Solarized / Solarized Dark** +in the Cloud9 workspace menu. diff --git a/content/running_spark_apps_with_emr_on_spot_instances/emr_allocation_strategies.md b/content/running_spark_apps_with_emr_on_spot_instances/emr_allocation_strategies.md new file mode 100644 index 00000000..183fd688 --- /dev/null +++ b/content/running_spark_apps_with_emr_on_spot_instances/emr_allocation_strategies.md @@ -0,0 +1,19 @@ +--- +title: "EMR Allocation Strategies" +weight: 35 +--- + +As an enhancement to the default EMR instance fleets cluster configuration, the allocation strategy feature is available in EMR version 5.12.1 and later. It optimizes the allocation of instance fleet capacity and lets you choose a target strategy for each cluster node. + +* On-Demand instances use a lowest-price strategy, which launches the lowest-priced instances first. +* Spot instances use a capacity-optimized strategy, which launches Spot instances from Spot instance pools that have optimal capacity for the number of instances that are launching. + +{{% notice note %}} +The allocation strategy option also lets you specify up to 15 EC2 instance types per task node when creating your cluster, as opposed to 5 maximum allowed by the default EMR cluster instance fleet configuration. +{{% /notice %}} + +The capacity-optimized allocation strategy for Spot instances uses real-time capacity data to allocate instances from the Spot instance pools with the optimal capacity for the number of instances that are launching. This allocation strategy is appropriate for workloads that have a higher cost of interruption. Examples include long-running jobs and multi-tenant persistent clusters running Apache Spark, Apache Hive, and Presto. This allocation strategy lets you specify up to 15 EC2 instance types on task instance fleets to diversify your Spot requests and get steep discounts. Previously, instance fleets allowed a maximum of five instance types. You can now diversify your Spot requests across these 15 pools within each Availability Zone and prioritize deploying into a deeper capacity pool to lower the chance of interruptions. With more instance type diversification, Amazon EMR has more capacity pools to allocate capacity from, and chooses the Spot Instances which are least likely to be interrupted. + +{{% notice info %}} +[Click here] (https://aws.amazon.com/blogs/big-data/optimizing-amazon-emr-for-resilience-and-cost-with-capacity-optimized-spot-instances/) For an in-depth blog post about capacity-optimized allocation strategy for Amazon EMR instance fleets. +{{% /notice %}} \ No newline at end of file diff --git a/content/running_spark_apps_with_emr_on_spot_instances/examining_cluster.md b/content/running_spark_apps_with_emr_on_spot_instances/examining_cluster.md index e0b1c541..7a6ea4eb 100644 --- a/content/running_spark_apps_with_emr_on_spot_instances/examining_cluster.md +++ b/content/running_spark_apps_with_emr_on_spot_instances/examining_cluster.md @@ -64,8 +64,8 @@ Some notable metrics:\ When you are done examining the cluster and using the different UIs, terminate the EMR cluster from the EMR management console. This is not the end of the workshop though - we still have some interesting steps to go. #### Number of executors in the cluster -With 40 Spot Units in the Task Instance Fleet, EMR launched either 10 * xlarge (running one executor) or 5 * 2xlarge instances (running 2 executors), so the Task Instance Fleet provides 10 executors / containers to the cluster.\ +With 32 Spot Units in the Task Instance Fleet, EMR launched either 8 * xlarge (running one executor) or 4 * 2xlarge instances (running 2 executors) or 2 * 4xlarge instances (running 4 executors), so the Task Instance Fleet provides 8 executors / containers to the cluster.\ The Core Instance Fleet launched one xlarge instance, able to run one executor. -{{%expand "Question: Did you see more than 11 containers in CloudWatch Metrics and in YARN ResourceManager? if so, do you know why? Click to expand the answer" %}} +{{%expand "Question: Did you see more than 9 containers in CloudWatch Metrics and in YARN ResourceManager? if so, do you know why? Click to expand the answer" %}} Your Spark application was configured to run in Cluster mode, meaning that the **Spark driver is running on the Core node**. Since it is counted as a container, this adds a container to our count, but it is not an executor. {{% /expand%}} diff --git a/content/running_spark_apps_with_emr_on_spot_instances/fleet_config_options.md b/content/running_spark_apps_with_emr_on_spot_instances/fleet_config_options.md index c758f7ee..d3d50f12 100644 --- a/content/running_spark_apps_with_emr_on_spot_instances/fleet_config_options.md +++ b/content/running_spark_apps_with_emr_on_spot_instances/fleet_config_options.md @@ -11,11 +11,11 @@ While our cluster is starting (7-8 minutes) and the step is running (4-10 minute Since Amazon EC2 Spot Instances [changed the pricing model and bidding is no longer required] (https://aws.amazon.com/blogs/compute/new-amazon-ec2-spot-pricing/), we have an optional "Max-price" field for our Spot requests, which would limit how much we're willing to pay for the instance. It is recommended to leave this value at 100% of the On-Demand price, in order to avoid limiting our instance diversification. We are going to pay the Spot market price regardless of the Maximum price that we can specify, and setting a higher max price does not increase the chance of getting Spot capacity nor does it decrease the chance of getting your Spot Instances interrupted when EC2 needs the capacity back. You can see the current Spot price in the AWS Management Console under EC2 -> Spot Requests -> **Pricing History**. #### Each instance counts as X units -This configuration allows us to give each instance type in our diversified fleet a weight that will count towards our Total units. By default, this weight is configured as the number of YARN VCores that the instance type has by default (this would typically equate to the number of EC2 vCPUs) - this way it's easy to set the Total units to the number of vCPUs we want our cluster to run with, and EMR will select the best instances while taking into account the required number of instances to run. For example, if r4.xlarge is the instance type that EMR found to be the least likely to be interrupted and has the lowest price out of our selection, its weight is 4 and our total units (only Spot) is 40, then 10 * r4.xlarge instances will be launched by EMR in the fleet. -If my Spark application is memory driven, I can set the total units to the total amount of memory I want my cluster to run with, and change the "Each instance counts as" field to the total memory of the instance, leaving aside some memory for the operating system and other processes. For example, for the r4.xlarge I can set its weight to 25. If I then set up the Total units to 500 then EMR will bring up 20 * r4.xlrage instances in the fleet. Since our executor size is 18 GB, one executor will run on this instance type. +This configuration allows us to give each instance type in our diversified fleet a weight that will count towards our Total units. By default, this weight is configured as the number of YARN VCores that the instance type has by default (this would typically equate to the number of EC2 vCPUs) - this way it's easy to set the Total units to the number of vCPUs we want our cluster to run with, and EMR will select the best instances while taking into account the required number of instances to run. For example, if r4.xlarge is the instance type that EMR found to be the least likely to be interrupted, its weight is 4 and our total units (only Spot) is 32, then 8 * r4.xlarge instances will be launched by EMR in the fleet. +If my Spark application is memory driven, I can set the total units to the total amount of memory I want my cluster to run with, and change the "Each instance counts as" field to the total memory of the instance, leaving aside some memory for the operating system and other processes. For example, for the r4.xlarge I can set its weight to 25. If I then set up the Total units to 500 then EMR will bring up 20 * r4.xlarge instances in the fleet. Since our executor size is 18 GB, one executor will run on this instance type. #### Defined duration This option will allow you run your EMR Instance Fleet on Spot Blocks, which are uninterrupted Spot Instances, available for 1-6 hours, at a lower discount compared to Spot Instances. #### Provisioning timeout -You can determine that after a set amount of minutes, if EMR is unable to provision your selected Spot Instances due to lack of capacity, it will either start On-Demand instances instead, or terminate the cluster. This can be determined according to the business definition of the cluster or Spark application - if it is SLA bound and should complete even at On-Demand price, then the "Switch to On-Demand" option might be suitable. However, make sure you diversify the instance types in the fleet when looking to use Spot Instances, before you look into failing over to On-Demand. Also, try to select instance types with lower interruption rates according to the [Spot Instance Advisor] (https://aws.amazon.com/ec2/spot/instance-advisor/) \ No newline at end of file +You can determine that after a set amount of minutes, if EMR is unable to provision your selected Spot Instances due to lack of capacity, it will either start On-Demand instances instead, or terminate the cluster. This can be determined according to the business definition of the cluster or Spark application - if it is SLA bound and should complete even at On-Demand price, then the "Switch to On-Demand" option might be suitable. However, make sure you diversify the instance types in the fleet when looking to use Spot Instances, before you look into failing over to On-Demand. \ No newline at end of file diff --git a/content/running_spark_apps_with_emr_on_spot_instances/launching_emr_cluster-2.md b/content/running_spark_apps_with_emr_on_spot_instances/launching_emr_cluster-2.md index 8b50d105..7a42fbdb 100644 --- a/content/running_spark_apps_with_emr_on_spot_instances/launching_emr_cluster-2.md +++ b/content/running_spark_apps_with_emr_on_spot_instances/launching_emr_cluster-2.md @@ -15,23 +15,20 @@ The workshop focuses on running Spot Instances across all the cluster node types #### **Master node**: Unless your cluster is very short-lived and the runs are cost-driven, avoid running your Master node on a Spot Instance. We suggest this because a Spot interruption on the Master node terminates the entire cluster. \ For the purpose of this workshop, we will run the Master node on a Spot Instance as we simulate a relatively short lived job running on a transient cluster. There will not be business impact if the job fails due to a Spot interruption and later re-started.\ -Click **Add / remove instance types to fleet** and select two relatively small and cheap instance types - i.e c4.large and m4.large and check Spot under target capacity. EMR will only provision one instance, but will select the best instance type for the Master node based on price and available capacity. +Click **Add / remove instance types to fleet** and select two relatively cheaper instance types - i.e c5.xlarge and m5.xlarge and check Spot under target capacity. EMR will only provision one instance, but will select the best instance type for the Master node from the Spot instance pools with the optimal capacity. ![FleetSelection1](/images/running-emr-spark-apps-on-spot/emrinstancefleets-master.png) #### **Core Instance Fleet**: Avoid using Spot Instances for Core nodes if your Spark applications use HDFS. That prevents a situation where Spot interruptions cause data loss for data that was written to the HDFS volumes on the instances. For short-lived applications on transient clusters, as is the case in this workshop, we are going to run our Core nodes on Spot Instances.\ When using EMR Instance Fleets, one Core node is mandatory. Since we want to scale out and run our Spark application on our Task nodes, let's stick to the one mandatory Core node. We will specify **4 Spot units**, and select instance types that count as 4 units and will allow to run one executor.\ -Under the core node type, Click **Add / remove instance types to fleet** and select instance types that have 4 vCPUs and enough memory to run an executor (given the 18G executor size), for example: +Under the core node type, click **Add / remove instance types to fleet** and select instance types that you noted before as suitable to run an executor (given the 18G executor size), for example: ![FleetSelection2](/images/running-emr-spark-apps-on-spot/emrinstancefleets-core1.png) #### **Task Instance Fleet**: Our task nodes will only run Spark executors and no HDFS DataNodes, so this is a great fit for scaling out and increasing the parallelization of our application's execution, to achieve faster execution times. -Under the task node type, Click **Add / remove instance types to fleet** and select the 5 instance types you noted before as suitable for our executor size and that had suitable interruption rates in the Spot Instance Advisor.\ -Since our executor size is 4 vCPUs, and each instance counts as the number of its vCPUs towards the total units, let's specify **40 Spot units** in order to run 10 executors, and allow EMR to select the best instance type in the Task Instance Fleet to run the executors on. In this example, it will either start 10 * r4.xlarge / r5.xlarge / i3.xlarge **or** 5 * r5.2xlarge / r4.2xlarge in EMR Task Instance Fleet. -{{% notice warning %}} -If you are using your own AWS account (not an account that was created for you in an AWS event), Keep reading: if your account is new, or you've never launched Spot Instances in the account, your ability to launch Spot Instances could be limited. To overcome this, please make sure you launch no more than 3 instances in the Task Instance Fleet. You can do this, for example, by only providing instances that count as 8 units, and specify 24 for Spot units.\ -{{% /notice %}} +Under the task node type, Click **Add / remove instance types to fleet** and select up to 15 instance types you noted before as suitable for our executor size.\ +Since our executor size is 4 vCPUs, and each instance counts as the number of its vCPUs towards the total units, let's specify **32 Spot units** in order to run 8 executors, and allow EMR to select the best instance type in the Task Instance Fleet to run the executors on. ![FleetSelection3](/images/running-emr-spark-apps-on-spot/emrinstancefleets-task2.png) diff --git a/content/running_spark_apps_with_emr_on_spot_instances/right_sizing_executors.md b/content/running_spark_apps_with_emr_on_spot_instances/right_sizing_executors.md index 95d6cd38..4eca770a 100644 --- a/content/running_spark_apps_with_emr_on_spot_instances/right_sizing_executors.md +++ b/content/running_spark_apps_with_emr_on_spot_instances/right_sizing_executors.md @@ -25,7 +25,7 @@ r5.xlarge: yarn.scheduler.maximum-allocation-mb 24576\ 2. With the Spark on YARN configuration option which was [introduced in EMR version 5.22] (https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-whatsnew-history.html#emr-5220-whatsnew): spark.yarn.executor.memoryOverheadFactor and defaults to 0.1875 (18.75% of the spark.yarn.executor.memoryOverhead setting ) -So we can conclude that if we decrease our executor size to ~18GB, we'll be able to use r4.xlarge and basically any of the R family instance types (or i3 that have the same vCPU:Mem ratio) as vCPU and Memory grows linearly within family sizes. If EMR will select an r4.2xlarge instance type from the list of supported instance types that we'll provide to EMR Instance Fleets, then it will run more than 1 executor on each instance, due to Spark dynamic allocation being enabled by default. +So we can conclude that if we decrease our executor size to ~18GB, we'll be able to use r4.xlarge and basically any of the R family instance types as vCPU and Memory grows linearly within family sizes. If EMR will select an r4.2xlarge instance type from the list of supported instance types that we'll provide to EMR Instance Fleets, then it will run more than 1 executor on each instance, due to Spark dynamic allocation being enabled by default. ![tags](/images/running-emr-spark-apps-on-spot/sparkmemory.png) diff --git a/content/running_spark_apps_with_emr_on_spot_instances/selecting_instance_types.md b/content/running_spark_apps_with_emr_on_spot_instances/selecting_instance_types.md index 3d8497c0..9bf03ff8 100644 --- a/content/running_spark_apps_with_emr_on_spot_instances/selecting_instance_types.md +++ b/content/running_spark_apps_with_emr_on_spot_instances/selecting_instance_types.md @@ -6,37 +6,58 @@ weight: 50 Let's use our newly acquired knowledge around Spark executor sizing in order to select the EC2 Instance Types that will be used in our EMR cluster.\ EMR clusters run Master, Core and Task node types. [Click here] (https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-master-core-task-nodes.html) to read more about the different node types. -We determined that in order to be flexible and allow running on multiple instance types, we will submit our Spark application with **"–executor-memory=18GB –executor-cores=4"**, +We determined that in order to be flexible and allow running on multiple instance types across R instance family, we will submit our Spark application with **"–executor-memory=18GB –executor-cores=4"**. + +We will use **[amazon-ec2-instance-selector](https://github.com/aws/amazon-ec2-instance-selector)** to help us select the relevant instance +types and families with sufficient number of vCPUs and RAM. +For example: We identified R family instances, so EMR can run executors that will consume 4 vCPUs and 18 GB of RAM and still leave free RAM for the operating system and other processes. First, we can select different-sized instance types from current generation, such as r5.xlarge, r5.2xlarge and r5.4xlarge. Next, we can select different-sized instance types from previous generation, such as r4.xlarge, r4.2xlarge and r4.4xlarge. Last, we can select different-sized instances from R family local storage and processor variants, such as R5d instance types (local NVMe-based SSDs) and R5a instance types (powered by AMD processors). + +There are over 275 different instance types available on EC2 which can make the process of selecting appropriate instance types difficult. **[amazon-ec2-instance-selector](https://github.com/aws/amazon-ec2-instance-selector)** helps you select compatible instance types for your application to run on. The command line interface can be passed resource criteria like vCPUs, memory, network performance, and much more and then return the available, matching instance types. + +Let's first install **amazon-ec2-instance-selector** on Cloud9 IDE: + +``` +curl -Lo ec2-instance-selector https://github.com/aws/amazon-ec2-instance-selector/releases/download/v1.3.0/ec2-instance-selector-`uname | tr '[:upper:]' '[:lower:]'`-amd64 && chmod +x ec2-instance-selector +sudo mv ec2-instance-selector /usr/local/bin/ +ec2-instance-selector --version +``` + +Now that you have ec2-instance-selector installed, you can run +`ec2-instance-selector --help` to understand how you could use it for selecting +instances that match your workload requirements. For the purpose of this workshop +we need to first get a group of instances with sizes between 4vCPU to 16vCPUs and belong to R5, R4, R5D and R5A instance types. +Run the following command to get the list of instances. + +```bash +ec2-instance-selector --vcpus-min 4 --vcpus-max 16 --allow-list '.*r5.*|.*r4.*|.*r5d.*|.*r5a.*' --deny-list '.*n.*|.*ad.*|.*b.*' +``` + +This should display a list like the one that follows (note results might differ depending on the region). We will use this instances as part of our EMR Core and Task Instance Fleets. + +``` +r4.2xlarge +r4.4xlarge +r4.xlarge +r5.2xlarge +r5.4xlarge +r5.xlarge +r5a.2xlarge +r5a.4xlarge +r5a.xlarge +r5d.2xlarge +r5d.4xlarge +r5d.xlarge +``` + +Internally ec2-instance-selector is making calls to the [DescribeInstanceTypes](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeInstanceTypes.html) for the specific region and filtering +the instances based on the criteria selected in the command line, in our case +we did filter for instances that meet the following criteria:\ + * Instances that have minimum 4 vCPUs and maximum 16 vCPUs\ + * Instances of R5, R4, R5D and R5A generation\ + * Instances that don't meet the regular expression `.*n.*|.*ad.*|.*b.*`, so effectively r5n, r5dn, r5ad and r5b. -We can use the [Spot Instance Advisor] (https://aws.amazon.com/ec2/spot/instance-advisor/) page to find the relevant instance types with sufficient number of vCPUs and RAM, and use this opportunity to also select instance types with low interruption rates. \ -For example: r5.2xlarge has 8 vCPUs and 64 GB of RAM, so EMR will automatically run 2 executors that will consume 36 GB of RAM and still leave free RAM for the operating system and other processes.\ -However, at the time of writing, when looking at the EU (Ireland) region in the Spot Instance advisor, the r5.2xlarge instance type is showing an interruption rate of >20%.\ -Instead, we'll focus on instance types with lower interruption rates and suitable vCPU/Memory ratio. As an example, at the time of writing, in the EU (Ireland) region, these could be: r4.xlarge, r4.2xlarge, i3.xlarge, i3.2xlarge, r5d.xlarge - -![Spot Instance Advisor](/images/running-emr-spark-apps-on-spot/spotinstanceadvisor1.png) {{% notice note %}} -Spot Instance interruption rates are dynamic, the above just provides a real world example from a specific time and would probably be different when you are performing this workshop. -{{% /notice %}} - -To keep our flexibility in place and be able to provide multiple instance types for our EMR cluster, we need to make sure that our executor size will be under the EMR YARN limitation that we saw in the previous step, - -**Your first task**: Find and take note of 5 instance types in the region where you have created your VPC to run your EMR cluster, which will allow running executors with at least 4 vCPUs and 30+ GB of RAM, and also have low Spot interruption rates (maximum 10-15%). - -{{%expand "Click here to see a hint for the task" %}} -Instance types with sufficient Memory and vCPUs for our executor size, as well as suitable for our desired vCPU:Mem ratio, and are also under the default memory EMR limitations:\ - -**Recommended for the workshop:**\ -- r4.xlarge and larger\ -- r5.xlarge and larger\ -- r5a.xlarge and larger\ -- r5d.xlarge and larger\ -- i3.xlarge and larger\ - -**Previous generation instance types:**\ -- r3.xlarge and larger\ -- i2.xlarge and larger\ -you will notice that these instance types have double the vCores as they do vCPU, as reflected in the EMR instance selection window - this is an EMR optimization method. Feel free to use these as well, but note that the executor calculations that we're referring to in the workshop will differ. Also, these previous generation instance types will perform slower and the application will take longer to complete.\ -Also note that not all instance types exist in all regions. -{{% /expand%}} - +You are encouraged to test what are the options that `ec2-instance-selector` provides and run a few commands with it to familiarize yourself with the tool. +For example, try running the same commands as you did before with the extra parameter **`--output table-wide`**. +{{% /notice %}} \ No newline at end of file diff --git a/static/images/running-emr-spark-apps-on-spot/c9after.png b/static/images/running-emr-spark-apps-on-spot/c9after.png new file mode 100644 index 00000000..d1298ba6 Binary files /dev/null and b/static/images/running-emr-spark-apps-on-spot/c9after.png differ diff --git a/static/images/running-emr-spark-apps-on-spot/c9before.png b/static/images/running-emr-spark-apps-on-spot/c9before.png new file mode 100644 index 00000000..aea8466c Binary files /dev/null and b/static/images/running-emr-spark-apps-on-spot/c9before.png differ diff --git a/static/images/running-emr-spark-apps-on-spot/cliexport.png b/static/images/running-emr-spark-apps-on-spot/cliexport.png index 13394af8..f0570fd5 100644 Binary files a/static/images/running-emr-spark-apps-on-spot/cliexport.png and b/static/images/running-emr-spark-apps-on-spot/cliexport.png differ diff --git a/static/images/running-emr-spark-apps-on-spot/emrinstancefleets-master.png b/static/images/running-emr-spark-apps-on-spot/emrinstancefleets-master.png index a74c2331..b76554ec 100644 Binary files a/static/images/running-emr-spark-apps-on-spot/emrinstancefleets-master.png and b/static/images/running-emr-spark-apps-on-spot/emrinstancefleets-master.png differ diff --git a/static/images/running-emr-spark-apps-on-spot/emrinstancefleets-task2.png b/static/images/running-emr-spark-apps-on-spot/emrinstancefleets-task2.png index eb7c99fd..645e6af3 100644 Binary files a/static/images/running-emr-spark-apps-on-spot/emrinstancefleets-task2.png and b/static/images/running-emr-spark-apps-on-spot/emrinstancefleets-task2.png differ diff --git a/static/images/running-emr-spark-apps-on-spot/emrinstancefleetsnetwork.png b/static/images/running-emr-spark-apps-on-spot/emrinstancefleetsnetwork.png index 307deb99..36df46f2 100644 Binary files a/static/images/running-emr-spark-apps-on-spot/emrinstancefleetsnetwork.png and b/static/images/running-emr-spark-apps-on-spot/emrinstancefleetsnetwork.png differ diff --git a/static/images/running-emr-spark-apps-on-spot/savingssummary.png b/static/images/running-emr-spark-apps-on-spot/savingssummary.png index a0253598..b4f98494 100644 Binary files a/static/images/running-emr-spark-apps-on-spot/savingssummary.png and b/static/images/running-emr-spark-apps-on-spot/savingssummary.png differ diff --git a/static/images/running-emr-spark-apps-on-spot/spotinstanceadvisor1.png b/static/images/running-emr-spark-apps-on-spot/spotinstanceadvisor1.png deleted file mode 100644 index 8dce7a91..00000000 Binary files a/static/images/running-emr-spark-apps-on-spot/spotinstanceadvisor1.png and /dev/null differ