Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Deployment fails if we do not set the MungeKeySsmParameter #21

Closed
deeppat opened this issue May 12, 2022 · 2 comments · Fixed by #22
Closed

[BUG] Deployment fails if we do not set the MungeKeySsmParameter #21

deeppat opened this issue May 12, 2022 · 2 comments · Fixed by #22
Assignees
Labels
bug Something isn't working

Comments

@deeppat
Copy link

deeppat commented May 12, 2022

Describe the bug
If you try to deploy the eda slurm cluster with the default config file with the MungeKeySsmParameter commented out, cdk still tries to find it in your account and fails deployment when it doesn't find the key.

For eg. my config file has it commented:

  #MungeKeySsmParameter: "/slurm/munge_key"

To Reproduce
Steps to reproduce the behavior:

  1. Clone Repo
  2. Call ./install.sh --prompt --cdk-cmd create
  3. Bootstrap succeeds, but slurm deployment fails with the following error:
slurmminimal: creating CloudFormation changeset...

 ❌  slurmminimal failed: Error [ValidationError]: Unable to fetch parameters [/slurm/munge_key] from parameter store for this account.
    at Request.extractError (/home/ec2-user/.nvm/versions/node/v16.15.0/lib/node_modules/aws-sdk/lib/protocol/query.js:50:29)
    at Request.callListeners (/home/ec2-user/.nvm/versions/node/v16.15.0/lib/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
    at Request.emit (/home/ec2-user/.nvm/versions/node/v16.15.0/lib/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
    at Request.emit (/home/ec2-user/.nvm/versions/node/v16.15.0/lib/node_modules/aws-sdk/lib/request.js:686:14)
    at Request.transition (/home/ec2-user/.nvm/versions/node/v16.15.0/lib/node_modules/aws-sdk/lib/request.js:22:10)
    at AcceptorStateMachine.runTo (/home/ec2-user/.nvm/versions/node/v16.15.0/lib/node_modules/aws-sdk/lib/state_machine.js:14:12)
    at /home/ec2-user/.nvm/versions/node/v16.15.0/lib/node_modules/aws-sdk/lib/state_machine.js:26:10
    at Request.<anonymous> (/home/ec2-user/.nvm/versions/node/v16.15.0/lib/node_modules/aws-sdk/lib/request.js:38:9)
    at Request.<anonymous> (/home/ec2-user/.nvm/versions/node/v16.15.0/lib/node_modules/aws-sdk/lib/request.js:688:12)
    at Request.callListeners (/home/ec2-user/.nvm/versions/node/v16.15.0/lib/node_modules/aws-sdk/lib/sequential_executor.js:116:18) {
  code: 'ValidationError',
  time: 2022-05-12T16:48:12.685Z,
  requestId: 'xxx',
  statusCode: 400,
  retryable: false,
  retryDelay: 542.6911270021684

Expected behavior
Expect the slurm app to be deployed

Screenshots
If applicable, add screenshots to help explain your problem.

Repository Version):
14fe152

Additional context
Add any other context about the problem here.

@deeppat deeppat added the bug Something isn't working label May 12, 2022
@cartalla
Copy link
Contributor

Confirmed. The ssm parameter name has a default name and it needs to be created and set if it doesn't already exist.
Otherwise, cannot give IAM permissions to the parameter so the slurm controller can read/write it.

cartalla added a commit that referenced this issue May 12, 2022
Required by slurm cluster instances to communicate with each other securely.

Resolves [bug #21](#21)
@cartalla
Copy link
Contributor

I was able to deploy a cluster using the bug's branch:

21-bug-deployment-fails-if-we-do-not-set-the-mungekeyssmparameter

deeppat pushed a commit that referenced this issue May 13, 2022
Required by slurm cluster instances to communicate with each other securely.

Resolves [bug #21](#21)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants