-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update config files and fix errors found in testing new configs
Add --RESEnvironmentName to the installer Ease initial integration with Research and Engineering Studio (RES). Automatically add the correct submitter security groups and configure the /home directory. Resolves #207 ============================ Update template config files Added more comments to clarify that these are examples that should be copied and customized by users. Added comments for typical configuration options. Deleted obsolete configs that were from v1. Resolves #203 ============================= Set default head node instance type based on architecture. Resolves #206 ============================== Clean up ansible-lint errors and warnings. Arm architecture cluster was failing because of an incorrect condition in the ansible playbook that is flagged by lint. ============================== Use vdi controller instead of cluster manager for users and groups info Cluster manager stopped being domain joined for some reason. ============================== Paginate describe_instances when creating head node a record. Otherwise, may not find the cluster head node instance. ============================== Add default MungeKeySecret. This should be the default or you can't access multiple clusters from the same server. ============================== Increase timeout for ssm command that configures submitters Need the time to compile slurm. ============================== Force slurm to be rebuilt for submitters of all os distributions even if they match the os of the cluster. Otherwise get errors because can't find PluginDir in the same location as when it was compiled. ============================== Paginate describe_instances in UpdateHeadNode lambda ============================== Add check for min memory of 4 GB for slurm controller
- Loading branch information
Showing
52 changed files
with
1,442 additions
and
1,151 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
--- | ||
#==================================================================== | ||
# Minimal cluster with all X86_64 instance types | ||
# | ||
# NOTE: This is just an example. | ||
# Please create your own revision controlled config file. | ||
# | ||
# No SlurmDbd in this configuration. | ||
# Configure 10 each of all x86_64 instance types. | ||
# | ||
# Defaults and valid configuration options are in source/config_schema.py. | ||
# Command line values override values in the config file. | ||
#==================================================================== | ||
|
||
StackName: slurm-all-arm-config | ||
|
||
# @TODO: Add Region | ||
# Region: {{Region}} | ||
|
||
# @TODO: Add your SshKeyPair | ||
# SshKeyPair: {{SshKeyPair}} | ||
|
||
# @TODO: Update with your VPC | ||
# VpcId: vpc-xxxxxxxxxxxxxxxxx | ||
|
||
# @TODO: Update with your private subnet in your VPC | ||
# SubnetId: subnet-xxxxxxxxxxxxxxxxx | ||
|
||
# @TODO: Update with your SNS Topic. Make sure to subscribe your email address to the topic and confirm the subscription | ||
# ErrorSnsTopicArn: arn:aws:sns:{{Region}}:{{AccountId}}:{{TopicName}} | ||
|
||
# @TODO: Add your preferred timezone so times aren't in UTC | ||
# TimeZone: America/Chicago # America/Los_Angeles or America/Denver or America/New_York | ||
|
||
# @TODO: If using Research and Engineering Studio, update with environment name | ||
# RESEnvironmentName: {{ResEnvironmentName}} | ||
|
||
slurm: | ||
ParallelClusterConfig: | ||
Version: 3.8.0 | ||
Architecture: arm64 | ||
# @TODO: Update DatabaseStackName with stack name you deployed ParallelCluster database into. See: https://docs.aws.amazon.com/parallelcluster/latest/ug/tutorials_07_slurm-accounting-v3.html#slurm-accounting-db-stack-v3 | ||
# Database: | ||
# DatabaseStackName: {{DatabaseStackName}} | ||
|
||
MungeKeySecret: SlurmMungeKey | ||
|
||
SlurmCtl: {} | ||
|
||
InstanceConfig: | ||
UseSpot: true | ||
Include: | ||
InstanceFamilies: ['.*'] | ||
InstanceTypes: [] | ||
NodeCounts: | ||
# @TODO: Update the max number of each instance type to configure | ||
DefaultMaxCount: 5 | ||
# @TODO: You can update the max instance count for each compute resource | ||
ComputeResourceCounts: | ||
od-1024gb-64-cores: # x2gd.16xlarge | ||
MaxCount: 1 | ||
sp-1024gb-64-cores: # x2gd.16xlarge | ||
MaxCount: 2 | ||
|
||
# @TODO: Configure storage mounts | ||
# storage: | ||
# ExtraMounts: | ||
# - dest: /home | ||
# StorageType: Efs | ||
# FileSystemId: 'fs-xxxxxxxxxxxxxxxxx' | ||
# src: fs-xxxxxxxxxxxxxxxxx.efs.{{Region}}.amazonaws.com:/ | ||
# type: nfs4 | ||
# options: nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport | ||
# ExtraMountSecurityGroups: | ||
# nfs: | ||
# DCV-Host: sg-xxxxxxxxxxxxxxxxx | ||
|
||
# @TODO: Configure license counts | ||
Licenses: | ||
vcs: | ||
Count: 10 | ||
Server: synopsys_licenses | ||
Port: '24680' | ||
ServerType: flexlm |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.