diff --git a/runbooks/source/delete-cluster.html.md.erb b/runbooks/source/delete-cluster.html.md.erb index a7dcb0b9..722cc609 100644 --- a/runbooks/source/delete-cluster.html.md.erb +++ b/runbooks/source/delete-cluster.html.md.erb @@ -1,7 +1,7 @@ --- title: Delete a cluster weight: 55 -last_reviewed_on: 2023-11-20 +last_reviewed_on: 2024-01-16 review_in: 6 months --- @@ -13,8 +13,8 @@ In most cases, it is recommended to pass responsibility for deleting a test clus ## Delete the cluster with Concourse `delete-cluster` pipeline -We have a [dedicated pipeline](https://concourse.cloud-platform.service.justice.gov.uk/teams/main/pipelines/delete-cluster) for deleting test clusters. You can configure and trigger this pipeline -against your test cluster for removal by utilising the associated cloud-platform cli `pipeline delete-cluster` [command](https://github.com/ministryofjustice/cloud-platform-cli/blob/19d33d6618013f0f4047a545b5f0d184d3d2fdfb/pkg/commands/pipeline.go). +We have a [dedicated pipeline](https://concourse.cloud-platform.service.justice.gov.uk/teams/main/pipelines/delete-cluster) for deleting test clusters. +You can configure and trigger this pipeline against your test cluster for removal by utilising the associated cloud-platform cli `pipeline delete-cluster` [command](https://github.com/ministryofjustice/cloud-platform-cli/blob/19d33d6618013f0f4047a545b5f0d184d3d2fdfb/pkg/commands/pipeline.go). In order to use this command, ensure you have the following installed: @@ -49,46 +49,65 @@ configuration updated started delete-cluster/delete #123 ``` -## Delete the cluster locally, using the cli `cluster delete` command +## Delete an EKS cluster manually + +Follow these steps, to delete the EKS cluster. -To delete a cluster: +First, set the kubectl context for the EKS cluster you are deleting. The easiest way to do this is with aws command: ``` -$ export AWS_PROFILE=moj-cp +$ export KUBECONFIG=~/.kube/config +$ export cluster= +$ aws eks --region eu-west-2 update-kubeconfig --name ${cluster} ``` -Start from the root directory of a working copy of the [infrastructure repo]. - -There is a [delete-cluster command] which will handle deleting your cluster. - -The command is entirely non-interactive, and will not prompt you to confirm anything. It just destroys things. +You should see this output: -### First, run `make tools-shell` +``` +Added new context arn:aws:eks:eu-west-2:754256621582:cluster/ to .kube/config -> The delete cluster command must *always* be run in a container. This ensures that the environment of the command is fully controlled, and you don't run into problems such as the kubernetes context being changed in another window, or extra environment variables causing unwanted effects. +``` -Then invoke the command like this: +Then, from the root of a checkout of the `cloud-platform-infrastructure` repository, run +these commands to destroy all cluster components, and delete the terraform workspace: ``` -cloud-platform cluster delete --name --dry-run=false +$ cd terraform/aws-accounts/cloud-platform-aws/vpc/eks/components +$ terraform init +$ terraform workspace select ${cluster} +$ terraform destroy ``` - -Run with `--dry-run=true` to do a dry run (if you don't pass a flag it will default to true), and see what commands would be executed. - -You can get more information using: +> The destroy process often gets stuck on prometheus operator. If that happens, running this in a separate window usually works: +> ``` +> kubectl -n monitoring delete job prometheus-operator-operator-cleanup +> ``` ``` -cloud-platform cluster delete --help +$ terraform workspace select default +$ terraform workspace delete ${cluster} ``` -If any steps fail: +Change directories and perform the following to destroy the EKS cluster, and delete the terraform workspace. -* Fix the underlying problem -* Re-run the command +``` +$ cd .. # working dir is now `eks` +$ terraform init +$ terraform workspace select ${cluster} +$ terraform destroy +$ terraform workspace select default +$ terraform workspace delete ${cluster} +``` -## Delete an EKS cluster manually +Change directories and perform the following to destroy the cluster VPC, and delete the terraform workspace. -The steps can be found here - [Delete an EKS Cluster] +``` +$ cd .. # working dir is now `vpc` +$ terraform init +$ terraform workspace select ${cluster} +$ terraform destroy +$ terraform workspace select default +$ terraform workspace delete ${cluster} +``` [infrastructure repo]: https://github.com/ministryofjustice/cloud-platform-infrastructure [delete-cluster command]: https://github.com/ministryofjustice/cloud-platform-cli/blob/19d33d6618013f0f4047a545b5f0d184d3d2fdfb/pkg/cluster/delete.go diff --git a/runbooks/source/eks-cluster.html.md.erb b/runbooks/source/eks-cluster.html.md.erb index e1f79a8a..11ded9e4 100644 --- a/runbooks/source/eks-cluster.html.md.erb +++ b/runbooks/source/eks-cluster.html.md.erb @@ -1,7 +1,7 @@ --- title: EKS Cluster weight: 350 -last_reviewed_on: 2024-01-09 +last_reviewed_on: 2024-01-16 review_in: 3 months --- @@ -11,7 +11,7 @@ review_in: 3 months You can create a new EKS test cluster using the [cluster build pipeline]. -Alternatively, using the `create-cluster` script. +Alternatively, if you want to create a cluster manually, follow the steps below. ## Pre-requisites @@ -42,16 +42,10 @@ export AUTH0_CLIENT_ID= export AUTH0_CLIENT_SECRET= ``` -Execute the script inside the [cloud-platform-tool] container from the root of [cloud-platform-infrastructure] repo, run: - -``` -make tools-shell -``` - -This will launch the tool container, from there you can run the execute script by providing the desired name of your new cluster. e.g.: +Execute the cloud-platform command to create a new cluster: ```bash -./create-cluster.rb --name mogaal-eks +cloud-platform cluster create --name ``` Check the pre-requisites and environment variables section of this document before running this script. @@ -60,12 +54,12 @@ NB: Your cluster name must be **no more than 12 characters**. Any longer, and so See our [cluster naming policy](https://github.com/ministryofjustice/cloud-platform/blob/main/architecture-decision-record/009-Naming-convention-for-clusters.md) for information on how to choose a suitable name for your cluster. -By default, the script will create a `small` cluster. This means the master and worker EC2 instances will be less powerful machine types than in our production cluster. +By default, the script will create a `small` cluster. This means the worker EC2 instances will be less powerful machine types than in our production cluster. You can see more options to use when creating the cluster by running: ```bash -./create-cluster.rb --help +cloud-platform cluster create --help ``` The script takes around 30 minutes to execute. At the end, you should see output like this: @@ -161,138 +155,26 @@ terraform workspace new terraform apply ``` -### 4. Delete the EKS cluster - -#### Delete the EKS cluster using the script - -There is a [destroy-cluster.rb] script which you can use to delete your cluster. - -Read the script before using it. Deleting a cluster is something you should be very cautious about, and ensure you know exactly what you're doing. - -The script is entirely non-interactive, and will not prompt you to confirm anything. It just destroys things. - -First, run `make tools-shell` - -> The delete cluster script must *always* be run in a container. This ensures that the environment of the script is fully controlled, and you don't run into problems such as the kubernetes context being changed in another window, or extra environment variables causing unwanted effects. - -Then invoke the script like this: - -``` -./destroy-cluster.rb --name [short cluster name] --yes -``` - -Run without `--yes` to do a dry run, and see what commands would be executed. - -You can get more information using: - -``` -./destroy-cluster.rb --help -``` - -If any steps fail: - -* Fix the underlying problem -* Edit the script to comment out any sections of the `ClusterDeleter.run` function which you no longer need to run -* Re-run the script - -#### Delete the cluster using concourse fly commands - -In case you prefer concourse pipeline to destroy the cluster, these are the steps to follow, to delete the cluster using "concourse fly commands" - -First, `cd`` to the working copy of the concourse [pipelines repo][pipelines repo]. Make below two changes to the [eks-create-test-destroy.yaml][create-test-destroy] file. - -In the eks-create-test-destroy pipeline definition, comment out the below line in destroy-cluster job. - - ``` - args: - # export $(cat keyval/keyval.properties | grep CLUSTER_NAME ) - ``` - -Commenting out this will not set the `CLUSTER_NAME` provided by the create-cluster-run-tests job. - -``` -./destroy-cluster.rb --name $CLUSTER_NAME --yes -``` - -Run the below commands updating the ``. - -The first fly command will apply the changes made for the [eks-create-test-destroy.yaml][create-test-destroy] file with the hardcoded `CLUSTER_NAME` in the destroy-cluster job - -The second command will trigger the destroy-cluster job for the CLUSTER_NAME updated in the destroy-cluster job. - -``` -fly -t manager sp -p create-test-destroy -c create-test-destroy.yaml -fly -t manager trigger-job -j create-test-destroy/destroy-cluster -``` -Note: After the destroy-cluster job completed sucessfully, run the [bootstrap pipleine][bootstrap pipleine] to discard the changes made to [eks-create-test-destroy.yaml][create-test-destroy] file. - -``` -fly -t manager trigger-job -j bootstrap/bootstrap-pipelines -``` - -#### Delete the EKS cluster manually - -Follow these steps, to delete the EKS cluster. - -First, set the kubectl context for the EKS cluster you are deleting. The easiest way to do this is with aws command: - -``` -$ export KUBECONFIG=~/.kube/config -$ export cluster= -$ aws eks --region eu-west-2 update-kubeconfig --name ${cluster} -``` - -You should see this output: - -``` -Added new context arn:aws:eks:eu-west-2:754256621582:cluster/ to .kube/config +## Creating a live like test cluster -``` +When testing clusteer upgrades, it is useful to test the procedure which is as close to the live cluster as possible. The following steps will update an existing test cluster +to the configuration similar to the live cluster. -Then, from the root of a checkout of the `cloud-platform-infrastructure` repository, run -these commands to destroy all cluster components, and delete the terraform workspace: +**Pre-requisites:** -``` -$ cd terraform/aws-accounts/cloud-platform-aws/vpc/eks/components -$ terraform init -$ terraform workspace select ${cluster} -$ terraform destroy -``` -> The destroy process often gets stuck on prometheus operator. If that happens, running this in a separate window usually works: -> ``` -> kubectl -n monitoring delete job prometheus-operator-operator-cleanup -> ``` +- a test cluster created using the [cluster build pipeline] or manually +- The environment variables and pre-requisites as described [above](#pre-requisites) -``` -$ terraform workspace select default -$ terraform workspace delete ${cluster} -``` +**Steps:** -Change directories and perform the following to destroy the EKS cluster, and delete the terraform workspace. - -``` -$ cd .. # working dir is now `eks` -$ terraform init -$ terraform workspace select ${cluster} -$ terraform destroy -$ terraform workspace select default -$ terraform workspace delete ${cluster} -``` - -Change directories and perform the following to destroy the cluster VPC, and delete the terraform workspace. - -``` -$ cd .. # working dir is now `vpc` -$ terraform init -$ terraform workspace select ${cluster} -$ terraform destroy -$ terraform workspace select default -$ terraform workspace delete ${cluster} -``` +- Update the node group desired count to same as live cluster (say 50) in the console. The terraform way of applying doesnt work for desired count +- Set the node_groups_count to same as live cluster (say 64) and default_ng_min_count to 50 in [terraform/aws-accounts/cloud-platform-aws/vpc/eks/cluster.tf] +- Apply the terraform code changes to the test cluster +- cd to [terraform/aws-accounts/cloud-platform-aws/vpc/eks/components] and enable ecr-exporter, cloudwatch_exporter, velero, overprovisioner and other components that are installed specific to live cluster +- Apply the terraform code changes to the test cluster +- Update the starter pack count to 40 and apply the terraform code changes to the test cluster +- Setup pingdom alerts for starter-pack helloworld app -[create a cluster]: https://runbooks.cloud-platform.service.justice.gov.uk/eks-cluster.html#provisioning-eks-clusters [cluster build pipeline]: https://concourse.cloud-platform.service.justice.gov.uk/teams/main/pipelines/create-cluster -[destroy-cluster.rb]: https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/main/destroy-cluster.rb -[create-test-destroy]: https://github.com/ministryofjustice/cloud-platform-terraform-concourse/blob/main/pipelines/manager/main/eks-create-test-destroy.yaml -[cloud-platform-tool]: https://github.com/ministryofjustice/cloud-platform-tools-image -[cloud-platform-infrastructure]: https://github.com/ministryofjustice/cloud-platform-infrastructure +[terraform/aws-accounts/cloud-platform-aws/vpc/eks/cluster.tf]: https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/cluster.tf +[terraform/aws-accounts/cloud-platform-aws/vpc/eks/components]: https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/components