Skip to content

Commit

Permalink
Merge pull request #5091 from ministryofjustice/update-node-upgrades
Browse files Browse the repository at this point in the history
fix: πŸ› links
  • Loading branch information
jaskaransarkaria authored Dec 13, 2023
2 parents 11285d6 + a735a41 commit 63488b5
Showing 1 changed file with 9 additions and 9 deletions.
18 changes: 9 additions & 9 deletions runbooks/source/node-group-changes.html.md.erb
Original file line number Diff line number Diff line change
Expand Up @@ -9,25 +9,25 @@ review_in: 6 months

## Why?

You may need to make a change to an EKS [cluster node group] or [instance type config]. We can't just let terraform apply these changes because terraform doesn't gracefully rollout the old and new nodes. Terraform will bring down all of the old nodes immediately, which will cause outages to users.
You may need to make a change to an EKS [cluster-node-group] or [instance-type-config]. We can't just let terraform apply these changes because terraform doesn't gracefully rollout the old and new nodes. Terraform will bring down all of the old nodes immediately, which will cause outages to users.

## How?

The method to avoid bringing down all the nodes at once is to follow these steps:

1. add a new node group with your [updated changes]
1. add a new node group with your [updated-changes]
1. lookup the old node group name (you can find this in the aws gui)
1. once merged in you can drain the old node group using the following command:
> cloud-platform pipeline cordon-and-drain --cluster-name <cluster_name> --node-group <old_node_group_name>
1. raise a new [pr deleting] the old node group
1. raise a new [pr-deleting] the old node group

notes:

- When making changes to the default node group in live, it's handy to pause the pipelines for each of our environments for the duration of the change.
- the `cloud-platform pipeline` command [cordons and drains nodes] in a given node group waiting 2mins between each drained node. This command runs remotely in concourse.
- the `cloud-platform pipeline` command [cordons-and-drains-nodes] in a given node group waiting 2mins between each drained node. This command runs remotely in concourse.

[cluster node group]: https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/97768bfd8b4e25df6f415035acac60cf531d88c1/terraform/aws-accounts/cloud-platform-aws/vpc/eks/cluster.tf#L60
[instance type config]: https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/97768bfd8b4e25df6f415035acac60cf531d88c1/terraform/aws-accounts/cloud-platform-aws/vpc/eks/cluster.tf#L43
[pr deleting]: https://github.com/ministryofjustice/cloud-platform-infrastructure/pull/2663
[updated changes]: https://github.com/ministryofjustice/cloud-platform-infrastructure/pull/2657
[cordons and drains nodes]: https://github.com/ministryofjustice/cloud-platform-terraform-concourse/blob/main/pipelines/manager/main/cordon-and-drain-nodes.yaml
[cluster-node-group]: https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/97768bfd8b4e25df6f415035acac60cf531d88c1/terraform/aws-accounts/cloud-platform-aws/vpc/eks/cluster.tf#L60
[instance-type-config]: https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/97768bfd8b4e25df6f415035acac60cf531d88c1/terraform/aws-accounts/cloud-platform-aws/vpc/eks/cluster.tf#L43
[pr-deleting]: https://github.com/ministryofjustice/cloud-platform-infrastructure/pull/2663
[updated-changes]: https://github.com/ministryofjustice/cloud-platform-infrastructure/pull/2657
[cordons-and-drains-nodes]: https://github.com/ministryofjustice/cloud-platform-terraform-concourse/blob/main/pipelines/manager/main/cordon-and-drain-nodes.yaml

0 comments on commit 63488b5

Please sign in to comment.