Skip to content

Latest commit

 

History

History
76 lines (65 loc) · 4.86 KB

instance-cleanup.md

File metadata and controls

76 lines (65 loc) · 4.86 KB

Cleaning up a botched service instance

You might end up in a situation where the broker is failing to cleanup resources that it has provisioned or bound. When that happens, follow this procedure:

Preparation

  1. Log into the AWS Console as a power user

    • For the data.gov deployment, use the SSBDev role.
  2. Take note of the cluster name and the domain name used for the instance you need to tear down. The VPC name may also be handy.

    • If you have credentials for the instance, check there.

    • You may see them listed in Terraform output from your most recent plan|apply|destroy

    • If you still have Terraform state handy, you can find them in the output of this command:

      terraform state show 'module.eks.aws_eks_cluster.this[0]' | grep 'domain\|arn\|vpc
    • If you still don't have this information look at the tags on the various clusters in Amazon Container Services > Amazon EKS > Clusters to figure it out.

Region-specific steps

  1. Make sure you’re looking at the correct region in the console (and double-check it if you're in the "global" region in steps below)

    • In the data.gov deployment, that's us-west-2.
  2. Amazon Container Services > Amazon EKS > Clusters

    • [cluster name] > Configuration tab > Compute
      • Delete Fargate Profile(s) if it exists (takes a few minutes)
      • Delete Managed Node Group(s) if it exists (takes a few minutes)
    • Delete the cluster (takes a few minutes but you can go do other things)
  3. EC2 > Load Balancers

    • Look for one tagged with the name of the k8s cluster and delete it if present
  4. EC2 > Target Groups

    • Look for one tagged with the name of the k8s cluster and delete it if present
  5. Certificate Manager > Certificates

    • Delete corresponding certificate (it should not be in use if you already deleted the Load Balancer)
  6. EFS > Filesystems

    • Delete corresponding EFS file system
  7. EBS > Volumes

    • Delete the volumes related to any k8s pods created on the corresponding k8s cluster
  8. VPC > NAT Gateways

    • Delete the one corresponding to your cluster
      • If you don't know which one it is, look for the one tagged with the k8s cluster name
  9. VPC > Your VPCs

    • Delete the one corresponding to your cluster
      • If you don't know which one it is, look for the one tagged with the k8s cluster name
      • If there’s anything red in the confirmation dialog, you missed something
  10. CloudWatch > Logs > Log Groups

    • Delete (up to) two matching log groups
  11. Route 53 > Hosted zones

    • In the top-level zone (eg ssb-dev.data.gov)
      • Delete the NS record for the cluster domain
      • Delete the DS record for the cluster domain
    • In the zone for the cluster (look for whatever domain was set)
      • Disable DNSSEC Signing (use the "parent zone" option)
      • Disable the Key Signing Key (KSK) via Advanced view > Edit Key > Inactive
      • Delete the Key Signing Key (KSK) via Advanced view > Delete Key
      • Delete all records except for the NS and SOA records
      • Delete the zone
  12. VPC > Elastic IPs

    • Look for one tagged with the name of the k8s cluster and release it if present

Out-of-region steps

  1. KMS > Customer Managed Keys
    • Note you MUST look in the US-EAST-1 region for this step, even if everything else has been in a different region. (Following the link above should do this for you.)
    • Schedule the ECC_NIST_P256 key aliased "DNSSEC-[clusterdomain]" for deletion (waiting period 7 days)
    • Delete the alias from the key (to avoid collisions if someone provisions with the same INSTANCE_NAME)
  2. AWS Firewall Manager > AWS WAF > Web ACLs
    • This is a global service; select the appropriate region in the form on the page.
    • Delete the corresponding WAF rule
  3. IAM > Access Management > Roles
    • This is a global service
    • For the data.gov deployment, use the SSBAdmin role.
    • Search for the cluster name
    • Delete all the related roles