diff --git a/docs/config.md b/docs/config.md
index 268a2ff3..e526edc6 100644
--- a/docs/config.md
+++ b/docs/config.md
@@ -93,6 +93,7 @@ This project creates a ParallelCluster configuration file that is documented in
Imds:
Secured: bool
InstanceConfig:
+ UseOnDemand: str
UseSpot: str
Exclude:
InstanceFamilies:
@@ -614,9 +615,154 @@ List of Amazon Resource Names (ARNs) of IAM policies for Amazon EC2 that will be
### InstanceConfig
-Configure the instances used by the cluster.
+Configure the instances used by the cluster for compute nodes.
-A partition will be created for each combination of Base OS, Architecture, and Spot.
+ParallelCluster is limited to a total of 50 compute resources and
+we only put 1 instance type in each compute resource.
+This limits you to a total of 50 instance types per cluster.
+If you need more instance types than that, then you will need to create multiple clusters.
+If you configure both on-demand and spot instances, then the limit is effectively 25 instance types because 2 compute resources will be created for each instance type.
+
+If you configure more than 50 instance types then the installer will fail with an error.
+You will then need to modify your configuration to either include fewer instance types or
+exclude instance types from the configuration.
+
+If no Include and Exclude parameters are specified then default EDA instance types
+will be configured.
+The defaults will include the latest generation instance families in the c, m, r, x, and u families.
+Older instance families are excluded.
+Metal instance types are also excluded.
+Specific instance types are also excluded to keep the total number of instance types under 50.
+If multiple instance types have the same amount of memory, then the instance types with the highest core counts are excluded.
+This is because EDA workloads are typically memory limited, not core limited.
+
+If any Include or Exclude parameters are specified, then minimal defaults will be used for the parameters that
+aren't specified.
+By default, all instance families are included and no specific instance types are included.
+By default, all instance types with less than 2 GiB of memory are excluded because they don't have enough memory for a Slurm compute node.
+
+If no includes or excludes are provided, the defaults are:
+
+```
+slurm:
+ InstanceConfig:
+ Exclude:
+ InstanceFamilies:
+ - 'a1' # Graviton 1
+ - 'c4' # Replaced by c5
+ - 'd2' # SSD optimized
+ - 'g3' # Replaced by g4
+ - 'g3s' # Replaced by g4
+ - 'h1' # SSD optimized
+ - 'i3' # SSD optimized
+ - 'i3en' # SSD optimized
+ - 'm4' # Replaced by m5
+ - 'p2' # Replaced by p3
+ - 'p3'
+ - 'p3dn'
+ - 'r4' # Replaced by r5
+ - 't2' # Replaced by t3
+ - 'x1'
+ - 'x1e'
+ InstanceTypes:
+ - '.*\.metal'
+ # Reduce the number of selected instance types to 25.
+ # Exclude larger core counts for each memory size
+ # 2 GB:
+ - 'c7a.medium'
+ - 'c7g.medium'
+ # 4 GB: m7a.medium, m7g.medium
+ - 'c7a.large'
+ - 'c7g.large'
+ # 8 GB: r7a.medium, r7g.medium
+ - 'm5zn.large'
+ - 'm7a.large'
+ - 'm7g.large'
+ - 'c7a.xlarge'
+ - 'c7g.xlarge'
+ # 16 GB: r7a.large, x2gd.medium, r7g.large
+ - 'r7iz.large'
+ - 'm5zn.xlarge'
+ - 'm7a.xlarge'
+ - 'm7g.xlarge'
+ - 'c7a.2xlarge'
+ - 'c7g.2xlarge'
+ # 32 GB: r7a.xlarge, x2gd.large, r7g.xlarge
+ - 'r7iz.xlarge'
+ - 'm5zn.2xlarge'
+ - 'm7a.2xlarge'
+ - 'm7g.2xlarge'
+ - 'c7a.4xlarge'
+ - 'c7g.4xlarge'
+ # 64 GB: r7a.2xlarge, x2gd.xlarge, r7g.2xlarge
+ - 'r7iz.2xlarge'
+ - 'm7a.4xlarge'
+ - 'm7g.4xlarge'
+ - 'c7a.8xlarge'
+ - 'c7g.8xlarge'
+ # 96 GB:
+ - 'm5zn.6xlarge'
+ - 'c7a.12xlarge'
+ - 'c7g.12xlarge'
+ # 128 GB: x2iedn.xlarge, r7iz.4xlarge, x2gd.2xlarge, r7g.4xlarge
+ - 'r7a.4xlarge'
+ - 'm7a.8xlarge'
+ - 'm7g.8xlarge'
+ - 'c7a.16xlarge'
+ - 'c7g.8xlarge'
+ # 192 GB: m5zn.12xlarge, m7a.12xlarge, m7g.12xlarge
+ - 'c7a.24xlarge'
+ # 256 GB: x2iedn.2xlarge, x2iezn.2xlarge, x2gd.4xlarge, r7g.8xlarge
+ - 'r7iz.8xlarge'
+ - 'r7a.8xlarge'
+ - 'm7a.16xlarge'
+ - 'm7g.16xlarge'
+ - 'c7a.32xlarge'
+ # 384 GB: r7iz.12xlarge, r7g.12xlarge
+ - 'r7a.12xlarge'
+ - 'm7a.24xlarge'
+ - 'c7a.48xlarge'
+ # 512 GB: x2iedn.4xlarge, x2iezn.4xlarge, x2gd.8xlarge, r7g.16xlarge
+ - 'r7iz.16xlarge'
+ - 'r7a.16xlarge'
+ - 'm7a.32xlarge'
+ # 768 GB: r7a.24xlarge, x2gd.12xlarge
+ - 'x2iezn.6xlarge'
+ - 'm7a.48xlarge'
+ # 1024 GB: x2iedn.8xlarge, x2iezn.8xlarge, x2gd.16xlarge
+ - 'r7iz.32xlarge'
+ - 'r7a.32xlarge'
+ # 1536 GB: x2iezn.12xlarge, x2idn.24xlarge
+ - 'r7a.48xlarge'
+ # 2048 GB: x2iedn.16xlarge
+ - 'x2idn.32xlarge'
+ # 3072 GB: x2iedn.24xlarge
+ # 4096 GB: x2iedn.32xlarge
+ Include:
+ InstanceFamilies:
+ - 'c7a' # AMD EPYC 9R14 Processor 3.7 GHz
+ - 'c7g' # AWS Graviton3 Processor 2.6 GHz
+ - 'm5zn' # Intel Xeon Platinum 8252 4.5 GHz
+ - 'm7a' # AMD EPYC 9R14 Processor 3.7 GHz
+ - 'm7g' # AWS Graviton3 Processor 2.6 GHz
+ - 'r7a' # AMD EPYC 9R14 Processor 3.7 GHz
+ - 'r7g' # AWS Graviton3 Processor 2.6 GHz
+ - 'r7iz' # Intel Xeon Scalable (Sapphire Rapids) 3.2 GHz
+ - 'x2gd' # AWS Graviton2 Processor 2.5 GHz 1TB
+ - 'x2idn' # Intel Xeon Scalable (Icelake) 3.5 GHz 2 TB
+ - 'x2iedn' # Intel Xeon Scalable (Icelake) 3.5 GHz 4 TB
+ - 'x2iezn' # Intel Xeon Platinum 8252 4.5 GHz 1.5 TB
+ - 'u.*'
+ InstanceTypes: []
+```
+
+#### UseOnDemand
+
+Configure on-demand instances.
+
+type: bool
+
+default: True
#### UseSpot
@@ -638,45 +784,13 @@ Instance families and types are regular expressions with implicit '^' and '$' at
Regular expressions with implicit '^' and '$' at the begining and end.
-An empty list is the same as '.*'.
-
-Default:
-
-```
-default_excluded_instance_families = [
- 'a1', # Graviton 1
- 'c4', # Replaced by c5
- 'd2', # SSD optimized
- 'g3', # Replaced by g4
- 'g3s', # Replaced by g4
- 'h1', # SSD optimized
- 'i3', # SSD optimized
- 'i3en', # SSD optimized
- 'm4', # Replaced by m5
- 'p2', # Replaced by p3
- 'p3',
- 'p3dn',
- 'r4', # Replaced by r5
- 't2', # Replaced by t3
- 'x1',
- 'x1e',
-]
-```
+Default: []
##### Exclude InstanceTypes
Regular expressions with implicit '^' and '$' at the begining and end.
-An empty list is the same as '.*'.
-
-Default:
-
-```
-default_excluded_instance_types = [
- '.+\.(micro|nano)', # Not enough memory
- '.*\.metal.*'
-]
-```
+Default: []
#### Include
@@ -698,70 +812,13 @@ If MaxSizeOnly is True then only the largest instance type in a family will be i
Regular expressions with implicit '^' and '$' at the begining and end.
-An empty list is the same as '.*'.
-
-Default:
-
-```
-default_eda_instance_families = [
- 'c7a', # AMD EPYC 9R14 Processor 3.7 GHz
-
- 'c7g', # AWS Graviton3 Processor 2.6 GHz
- # 'c7gd', # AWS Graviton3 Processor 2.6 GHz
- # 'c7gn', # AWS Graviton3 Processor 2.6 GHz
-
- # 'c7i', # Intel Xeon Scalable (Sapphire Rapids) 3.2 GHz
-
- #'f1', # Intel Xeon E5-2686 v4 (Broadwell) 2.3 GHz
-
- 'm5zn', # Intel Xeon Platinum 8252 4.5 GHz
-
- 'm7a', # AMD EPYC 9R14 Processor 3.7 GHz
-
- # 'm7i', # Intel Xeon Scalable (Sapphire Rapids) 3.2 GHz
-
- 'm7g', # AWS Graviton3 Processor 2.6 GHz
- # 'm7gd', # AWS Graviton3 Processor 2.6 GHz
-
- 'r7a', # AMD EPYC 9R14 Processor 3.7 GHz
-
- 'r7g', # AWS Graviton3 Processor 2.6 GHz
- # 'r7gd', # AWS Graviton3 Processor 2.6 GHz
-
- # 'r7i', # Intel Xeon Scalable (Sapphire Rapids) 3.2 GHz
-
- 'r7iz', # Intel Xeon Scalable (Sapphire Rapids) 3.2 GHz
-
- 'x2gd', # AWS Graviton2 Processor 2.5 GHz 1TB
-
- 'x2idn', # Intel Xeon Scalable (Icelake) 3.5 GHz 2 TB
-
- 'x2iedn', # Intel Xeon Scalable (Icelake) 3.5 GHz 4 TB
-
- 'x2iezn', # Intel Xeon Platinum 8252 4.5 GHz 1.5 TB
-
- #'u-6tb1', # Intel Xeon Scalable (Skylake) 6 TB
- #'u-9tb1', # Intel Xeon Scalable (Skylake) 9 TB
- #'u-12tb1', # Intel Xeon Scalable (Skylake) 12 TB
-]
-```
+Default: []
##### Include InstanceTypes
Regular expressions with implicit '^' and '$' at the begining and end.
-An empty list is the same as '.*'.
-
-Default:
-
-```
-default_eda_instance_types = [
- #'c5\.(l|x|2|4|9|18).*', # Intel Xeon Platinum 8124M 3.4 GHz
- #'c5\.(12|24).*', # Intel Xeon Platinum 8275L 3.6 GHz
- #'c5d\.(l|x|2|4|9|18).*', # Intel Xeon Platinum 8124M 3.4 GHz
- #'c5d\.(12|24).*', # Intel Xeon Platinum 8275L 3.6 GHz
-]
-```
+Default: []
#### NodeCounts
diff --git a/source/cdk/cdk_slurm_stack.py b/source/cdk/cdk_slurm_stack.py
index 08ca3dad..635b13e8 100644
--- a/source/cdk/cdk_slurm_stack.py
+++ b/source/cdk/cdk_slurm_stack.py
@@ -78,7 +78,7 @@
sys.path.append(f"{dirname(__file__)}/../resources/playbooks/roles/SlurmCtl/files/opt/slurm/cluster/bin")
from EC2InstanceTypeInfoPkg.EC2InstanceTypeInfo import EC2InstanceTypeInfo
-from SlurmPlugin import SlurmPlugin
+from SlurmPlugin import logger as SlurmPlugin_logger, SlurmPlugin
pp = PrettyPrinter()
@@ -460,6 +460,22 @@ def check_config(self):
logger.error(f"ParallelCluster requires VolumeId for {mount_dir} in slurm/storage/ExtraMounts")
config_errors += 1
+ # If no instance config has been set then choose EDA defaults
+ if 'InstanceFamilies' not in self.config['slurm']['InstanceConfig']['Include'] and 'InstanceTypes' not in self.config['slurm']['InstanceConfig']['Include'] and 'InstanceFamilies' not in self.config['slurm']['InstanceConfig']['Exclude'] and 'InstanceTypes' not in self.config['slurm']['InstanceConfig']['Exclude']:
+ self.config['slurm']['InstanceConfig']['Include']['InstanceFamilies'] = config_schema.default_included_eda_instance_families
+ self.config['slurm']['InstanceConfig']['Include']['InstanceTypes'] = config_schema.default_included_eda_instance_types
+ self.config['slurm']['InstanceConfig']['Exclude']['InstanceFamilies'] = config_schema.default_excluded_eda_instance_families
+ self.config['slurm']['InstanceConfig']['Exclude']['InstanceTypes'] = config_schema.default_excluded_eda_instance_types
+ # Set non-eda defaults
+ if 'InstanceFamilies' not in self.config['slurm']['InstanceConfig']['Include']:
+ self.config['slurm']['InstanceConfig']['Include']['InstanceFamilies'] = config_schema.default_included_instance_families
+ if 'InstanceTypes' not in self.config['slurm']['InstanceConfig']['Include']:
+ self.config['slurm']['InstanceConfig']['Include']['InstanceTypes'] = config_schema.default_included_instance_types
+ if 'InstanceFamilies' not in self.config['slurm']['InstanceConfig']['Exclude']:
+ self.config['slurm']['InstanceConfig']['Exclude']['InstanceFamilies'] = config_schema.default_excluded_instance_families
+ if 'InstanceTypes' not in self.config['slurm']['InstanceConfig']['Exclude']:
+ self.config['slurm']['InstanceConfig']['Exclude']['InstanceTypes'] = config_schema.default_excluded_instance_types
+
# Check to make sure controller instance type has at least 4 GB of memmory.
slurmctl_instance_type = self.config['slurm']['SlurmCtl']['instance_type']
slurmctl_memory_in_gb = int(self.get_instance_type_info(slurmctl_instance_type)['MemoryInMiB'] / 1024)
@@ -733,7 +749,6 @@ def update_config_for_res(self):
else:
# RES takes the shared file system for /home as a parameter; it is not created by RES.
# parameter SharedHomeFileSystemId
- logger.setLevel(logging.DEBUG)
logger.debug(f"Searching for RES /home file system")
res_shared_storage_stack_name = res_stack_name
if res_shared_storage_stack_name not in stack_statuses:
@@ -1146,6 +1161,7 @@ def check_regions_config(self):
self.plugin = SlurmPlugin(slurm_config_file=None, region=self.cluster_region)
self.plugin.instance_type_and_family_info = self.eC2InstanceTypeInfo.instance_type_and_family_info
+ logger.debug(f"Getting instance types from config")
self.region_instance_types = self.plugin.get_instance_types_from_instance_config(self.config['slurm']['InstanceConfig'], [self.cluster_region], self.eC2InstanceTypeInfo)
self.instance_types = []
region_instance_types = self.region_instance_types[self.cluster_region]
@@ -1158,6 +1174,7 @@ def check_regions_config(self):
self.instance_types = sorted(self.instance_types)
# Filter the instance types by architecture due to PC limitation to 1 architecture
+ # Also require at least 2 GB of memory.
cluster_architecture = self.config['slurm']['ParallelClusterConfig']['Architecture']
logger.info(f"ParallelCluster Architecture: {cluster_architecture}")
filtered_instance_types = []
@@ -1166,6 +1183,10 @@ def check_regions_config(self):
if instance_architecture != cluster_architecture:
logger.warning(f"Excluding {instance_type} because architecture ({instance_architecture}) != {cluster_architecture}")
continue
+ mem_gb = int(self.plugin.get_MemoryInMiB(self.cluster_region, instance_type) / 1024)
+ if mem_gb < 2:
+ logger.warning(f"Excluding {instance_type} because has less than 2 GiB of memory.")
+ continue
filtered_instance_types.append(instance_type)
self.instance_types = filtered_instance_types
logger.info(f"ParallelCluster configured to use {len(self.instance_types)} instance types :\n{pp.pformat(self.instance_types)}")
@@ -2402,9 +2423,9 @@ def create_parallel_cluster_config(self):
# We are limited to MAX_NUMBER_OF_QUEUES queues and MAX_NUMBER_OF_COMPUTE_RESOURCES compute resources.
# First analyze the selected instance types to make sure that these limits aren't exceeded.
# The fundamental limit is the limit on the number of compute resources.
- # Each compute resource maps to a NodeName and I want instance type to be selected using a constraint.
+ # Each compute resource maps to a NodeName and I want instance type to be able to be selected using a constraint.
# This means that each compute resource can only contain a single instance type.
- # This limits the number of instance type to MAX_NUMBER_OF_COMPUTE_RESOURCES or MAX_NUMBER_OF_COMPUTE_RESOURCES/2 if you configure spot instances.
+ # This limits the number of instance types to MAX_NUMBER_OF_COMPUTE_RESOURCES or MAX_NUMBER_OF_COMPUTE_RESOURCES/2 if you configure spot instances.
#
# We could possible support more instance types by putting instance types with the same amount of cores and memory into the same compute resource.
# The problem with doing this is that you can wind up with very different instance types in the same compute node.
@@ -2415,14 +2436,17 @@ def create_parallel_cluster_config(self):
# If the user configures too many instance types, then flag an error and print out the configured instance
# types and suggest instance types to exclude.
- purchase_options = ['ONDEMAND']
+ purchase_options = []
+ if self.config['slurm']['InstanceConfig']['UseOnDemand']:
+ purchase_options.append('ONDEMAND')
if self.config['slurm']['InstanceConfig']['UseSpot']:
purchase_options.append('SPOT')
- MAX_NUMBER_OF_INSTANCE_TYPES = int(MAX_NUMBER_OF_COMPUTE_RESOURCES / 2)
- else:
- MAX_NUMBER_OF_INSTANCE_TYPES = MAX_NUMBER_OF_COMPUTE_RESOURCES
+ if not len(purchase_options):
+ logger.error(f"Must specify either slurm/InstanceConfig/UseOnDemand or UseSpot.")
+ exit(1)
+ MAX_NUMBER_OF_INSTANCE_TYPES = int(MAX_NUMBER_OF_COMPUTE_RESOURCES / len(purchase_options))
- # Create list of instance types by number of cores and amount of memory
+ # Create list of instance types by number of cores and amount of memory
instance_types_by_core_memory = {}
# Create list of instance types by amount of memory and number of cores
instance_types_by_memory_core = {}
@@ -2468,34 +2492,46 @@ def create_parallel_cluster_config(self):
exit(1)
- nodesets = {}
+ # partition_nodesets is a dictionary indexed by partition name and containing a list of nodesets.
+ partition_nodesets = {}
number_of_queues = 0
number_of_compute_resources = 0
- for purchase_option in purchase_options:
- nodesets[purchase_option] = []
# Create 1 queue and compute resource for each instance type and purchase option.
for purchase_option in purchase_options:
for instance_type in self.instance_types:
+ logger.debug(f"Creating queue for {purchase_option} {instance_type}")
efa_supported = self.plugin.get_EfaSupported(self.cluster_region, instance_type) and self.config['slurm']['ParallelClusterConfig']['EnableEfa']
+ mem_gb = int(self.plugin.get_MemoryInMiB(self.cluster_region, instance_type) / 1024)
if purchase_option == 'ONDEMAND':
queue_name_prefix = "od"
allocation_strategy = 'lowest-price'
price = self.plugin.instance_type_and_family_info[self.cluster_region]['instance_types'][instance_type]['pricing']['OnDemand']
+ purchase_option_partition = "on-demand"
else:
queue_name_prefix = "sp"
allocation_strategy = 'capacity-optimized'
- price = self.plugin.instance_type_and_family_info[self.cluster_region]['instance_types'][instance_type]['pricing']['spot']['max']
+ price = self.plugin.instance_type_and_family_info[self.cluster_region]['instance_types'][instance_type]['pricing']['spot'].get('max', None)
+ purchase_option_partition = "spot"
queue_name = f"{queue_name_prefix}-{instance_type}"
queue_name = queue_name.replace('.', '-')
queue_name = queue_name.replace('large', 'l')
queue_name = queue_name.replace('medium', 'm')
+ if not price:
+ logger.warning(f"Skipping {queue_name} because {instance_type} doesn't have spot pricing")
+ continue
logger.info(f"Configuring {queue_name} queue:")
if number_of_queues >= MAX_NUMBER_OF_QUEUES:
logger.error(f"Can't create {queue_name} queue because MAX_NUMBER_OF_QUEUES=={MAX_NUMBER_OF_QUEUES} and have {number_of_queues} queues.")
exit(1)
nodeset = f"{queue_name}_nodes"
- nodesets[purchase_option].append(nodeset)
+ if purchase_option_partition not in partition_nodesets:
+ partition_nodesets[purchase_option_partition] = []
+ partition_nodesets[purchase_option_partition].append(nodeset)
+ mem_partition = f"{queue_name_prefix}-{mem_gb}-gb"
+ if mem_partition not in partition_nodesets:
+ partition_nodesets[mem_partition] = []
+ partition_nodesets[mem_partition].append(nodeset)
parallel_cluster_queue = self.create_queue_config(queue_name, allocation_strategy, purchase_option)
number_of_queues += 1
@@ -2534,6 +2570,7 @@ def create_parallel_cluster_config(self):
compute_resource['StaticNodePriority'] = int(price * 1000)
compute_resource['DynamicNodePriority'] = int(price * 10000)
parallel_cluster_queue['ComputeResources'].append(compute_resource)
+ number_of_compute_resources += 1
self.parallel_cluster_config['Scheduling']['SlurmQueues'].append(parallel_cluster_queue)
logger.info(f"Created {number_of_queues} queues with {number_of_compute_resources} compute resources")
@@ -2556,25 +2593,14 @@ def create_parallel_cluster_config(self):
self.parallel_cluster_config['Scheduling']['SlurmSettings']['CustomSlurmSettings'].append(slurm_settings_dict)
# Create custom partitions based on those created by ParallelCluster
- if 'ONDEMAND' in nodesets:
+ for partition in partition_nodesets:
self.parallel_cluster_config['Scheduling']['SlurmSettings']['CustomSlurmSettings'].extend(
[
{
- 'PartitionName': 'on-demand',
+ 'PartitionName': partition,
'Default': 'NO',
'PriorityTier': '1',
- 'Nodes': ','.join(nodesets['ONDEMAND']),
- }
- ]
- )
- if 'SPOT' in nodesets:
- self.parallel_cluster_config['Scheduling']['SlurmSettings']['CustomSlurmSettings'].extend(
- [
- {
- 'PartitionName': 'spot',
- 'Default': 'NO',
- 'PriorityTier': '10',
- 'Nodes': ','.join(nodesets['SPOT']),
+ 'Nodes': ','.join(partition_nodesets[partition]),
}
]
)
diff --git a/source/cdk/config_schema.py b/source/cdk/config_schema.py
index e8be3892..a170f9a0 100644
--- a/source/cdk/config_schema.py
+++ b/source/cdk/config_schema.py
@@ -359,7 +359,7 @@ def DEFAULT_OS(config):
]
# By default I've chosen to exclude *7i instance types because they have 50% of the cores as *7z instances with the same memory.
-default_eda_instance_families = [
+default_included_eda_instance_families = [
'c7a', # AMD EPYC 9R14 Processor 3.7 GHz
'c7g', # AWS Graviton3 Processor 2.6 GHz
@@ -396,61 +396,26 @@ def DEFAULT_OS(config):
'x2iezn', # Intel Xeon Platinum 8252 4.5 GHz 1.5 TB
- 'u',
+ 'u.*',
#'u-6tb1', # Intel Xeon Scalable (Skylake) 6 TB
#'u-9tb1', # Intel Xeon Scalable (Skylake) 9 TB
#'u-12tb1', # Intel Xeon Scalable (Skylake) 12 TB
]
-old_eda_instance_families = [
- 'c5', # Mixed depending on size
- 'c5a', # AMD EPYC 7R32 3.3 GHz
- 'c5ad', # AMD EPYC 7R32 3.3 GHz
- 'c6a',
- 'c6ad',
- 'c6i', # Intel Xeon 8375C (Ice Lake) 3.5 GHz
- 'c6id',
- 'c6g', # AWS Graviton2 Processor 2.5 GHz
- 'c6gd', # AWS Graviton2 Processor 2.5 GHz
- 'f1', # Intel Xeon E5-2686 v4 (Broadwell) 2.3 GHz
- 'm5', # Intel Xeon Platinum 8175 (Skylake) 3.1 GHz
- 'm5d', # Intel Xeon Platinum 8175 (Skylake) 3.1 GHz
- 'm5a', # AMD EPYC 7571 2.5 GHz
- 'm5ad', # AMD EPYC 7571 2.5 GHz
- 'm5zn', # Intel Xeon Platinum 8252 4.5 GHz
- 'm6a', # AMD EPYC 7R13 Processor 3.6 GHz
- 'm6ad',
- 'm6i', # Intel Xeon 8375C (Ice Lake) 3.5 GHz
- 'm6id',
- 'm6g', # AWS Graviton2 Processor 2.5 GHz
- 'm6gd', # AWS Graviton2 Processor 2.5 GHz
- 'r5', # Intel Xeon Platinum 8175 (Skylake) 3.1 GHz
- 'r5d', # Intel Xeon Platinum 8175 (Skylake) 3.1 GHz
- 'r5b', # Intel Xeon Platinum 8259 (Cascade Lake) 3.1 GHz
- 'r5a', # AMD EPYC 7571 2.5 GHz
- 'r5ad', # AMD EPYC 7571 2.5 GHz
- 'r6a',
- 'r6i', # Intel Xeon 8375C (Ice Lake) 3.5 GHz 1TB
- 'r6id',
- 'r6g', # AWS Graviton2 Processor 2.5 GHz
- 'r6gd', # AWS Graviton2 Processor 2.5 GHz
- 'x1', # High Frequency Intel Xeon E7-8880 v3 (Haswell) 2.3 GHz 2TB
- 'x1e', # High Frequency Intel Xeon E7-8880 v3 (Haswell) 2.3 GHz 4TB
- 'x2gd', # AWS Graviton2 Processor 2.5 GHz 1TB
- 'x2idn', # Intel Xeon Scalable (Icelake) 3.5 GHz 2 TB
- 'x2iedn', # Intel Xeon Scalable (Icelake) 3.5 GHz 4 TB
- 'x2iezn', # Intel Xeon Platinum 8252 4.5 GHz 1.5 TB
- 'z1d', # Intel Xeon Platinum 8151 4.0 GHz
-]
+default_included_instance_families = []
-default_eda_instance_types = [
+default_included_eda_instance_types = [
#'c5\.(l|x|2|4|9|18).*', # Intel Xeon Platinum 8124M 3.4 GHz
#'c5\.(12|24).*', # Intel Xeon Platinum 8275L 3.6 GHz
#'c5d\.(l|x|2|4|9|18).*', # Intel Xeon Platinum 8124M 3.4 GHz
#'c5d\.(12|24).*', # Intel Xeon Platinum 8275L 3.6 GHz
]
-default_excluded_instance_families = [
+default_included_instance_types = []
+
+default_excluded_instance_families = []
+
+default_excluded_eda_instance_families = [
'a1', # Graviton 1
'c4', # Replaced by c5
'd2', # SSD optimized
@@ -469,8 +434,9 @@ def DEFAULT_OS(config):
'x1e',
]
-default_excluded_instance_types = [
- '.+\.(micro|nano)', # Not enough memory
+default_excluded_instance_types = []
+
+default_excluded_eda_instance_types = [
'.*\.metal.*',
# Reduce the number of selected instance types to 25.
@@ -727,25 +693,27 @@ def get_config_schema(config):
# Configure the instances used by the cluster
# A partition will be created for each combination of Base OS, Architecture, and Spot
'InstanceConfig': {
+ # UseOnDemand:
+ # Configure on-demand instances
+ Optional('UseOnDemand', default=True): bool,
# UseSpot:
# Configure spot instances
Optional('UseSpot', default=True): bool,
# Include*/Exclude*:
# Instance families and types are regular expressions with implicit '^' and '$' at the begining and end.
- # Exclude patterns are processed first and take precesdence over any includes.
- # An empty list is the same as '.*'.
- Optional('Exclude', default={'InstanceFamilies': default_excluded_instance_families, 'InstanceTypes': default_excluded_instance_types}): {
- Optional('InstanceFamilies', default=default_excluded_instance_families): [str],
- Optional('InstanceTypes', default=default_excluded_instance_types): [str]
+ # Exclude patterns are processed first and take precedence over any includes.
+ Optional('Exclude', default={}): {
+ Optional('InstanceFamilies'): [str],
+ Optional('InstanceTypes'): [str]
},
- Optional('Include', default={'MaxSizeOnly': False, 'InstanceFamilies': default_eda_instance_families, 'InstanceTypes': default_eda_instance_types}): {
+ Optional('Include', default={'MaxSizeOnly': False}): {
# MaxSizeOnly:
# If MaxSizeOnly is True then only the largest instance type in
# a family will be included unless specific instance types are included.
# Default: false
Optional('MaxSizeOnly', default=False): bool,
- Optional('InstanceFamilies', default=default_eda_instance_families): [str],
- Optional('InstanceTypes', default=default_eda_instance_types): [str]
+ Optional('InstanceFamilies'): [str],
+ Optional('InstanceTypes'): [str]
},
'NodeCounts': {
Optional('DefaultMinCount', default=0): And(int, lambda s: s >= 0),
diff --git a/source/requirements.txt b/source/requirements.txt
index 60b01311..6692894a 100644
--- a/source/requirements.txt
+++ b/source/requirements.txt
@@ -10,5 +10,5 @@ pytest
python-hostlist
pip
requests
-PyYAML>=5.4.1
+PyYAML>5.4.1
schema