Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best way to remove asg without rebuilding all of them. #476

Closed
sethpollack opened this issue Aug 23, 2019 · 26 comments
Closed

Best way to remove asg without rebuilding all of them. #476

sethpollack opened this issue Aug 23, 2019 · 26 comments

Comments

@sethpollack
Copy link
Contributor

When I remove group-01 it changes the indexes and forces a rebuild of group-02 and group-03. Is there a recommended way to avoid that?

  worker_groups = [
    {
      name                     = "group-01"
      instance_type            = "r5.2xlarge"
    },
    {
      name                     = "group-02"
      instance_type            = "m5.4xlarge"
    },
    {
      name                     = "group-03"
      instance_type            = "m5.8xlarge"
    }
  ]
@barryib
Copy link
Member

barryib commented Aug 23, 2019

I don't think you can do that actually. But since Terraform 0.12.6, we can now use for_each in resources. With for_each the resource reference is not an indice anymore. This let us remove specific resource within a list or a map.

More info here:

@barryib
Copy link
Member

barryib commented Aug 23, 2019

Maybe this will help you as workaround for now hashicorp/terraform#14275 (comment)

@sethpollack
Copy link
Contributor Author

Cool thanks!. Are there any plans to move to for_each?

@aeugenio
Copy link

aeugenio commented Sep 11, 2019

we have an eks cluster with five worker groups (asg's). i wanted to upgrade them from the 12.7 ami to the 12.10 ami. based on https://docs.aws.amazon.com/eks/latest/userguide/migrate-stack.html, i tried these steps:

  1. helm delete --purge cluster-autoscaler
  2. in our eks/main.tf which uses this module, duplicate the asg's inside the worker_groups list and update the ami's on the new worker groups. (this means add 5 new asg's to the end of the list.)
  3. tf apply; confirm ten asg's on 2 different launch configs
  4. run ./drain-asgs.sh which is
#!/bin/bash

set -e

K8S_VERSION=1.12.7
nodes=$(kubectl  get nodes -o jsonpath="{.items[?(@.status.nodeInfo.kubeletVersion==\"v$K8S_VERSION\")].metadata.name}")
for node in ${nodes[@]}
do
    echo "Draining $node"
    kubectl drain $node --ignore-daemonsets --delete-local-data
done 
  1. delete original asg's from worker_groups list in eks/main.tf. (this means the list is back to its original size of 5 and all using the new 12.10 ami.)
  2. tf apply
    Plan: 5 to add, 5 to change, 20 to destroy.

because the worker_groups logic is index-based, a bunch of chaos ensues and the new asg's actually end up deleted.

sounds like using for_each would allow worker_groups mgmt to move away from index-based logic and will support removing groups from the list.

@barryib
Copy link
Member

barryib commented Oct 9, 2019

I would like to work on this, but it sounds like it will force everyone to move theirs resources in states.

I don't know how this can be painful for users.

Any thoughts ? @max-rocket-internet @dpiddockcmp ?

@aeugenio
Copy link

aeugenio commented Oct 9, 2019

i had the same concerns about existing tf state, but fortunately our team is in a spot where we can rebuild all of our envs. so i forked somewhere in between 6.0.1 and 6.0.2 and implemented using a map of maps for worker_group_launch_templates instead of a list. our team also had to split the eks control plane from the worker groups as flipping bits for the eks api endpoint access vars was causing all the asg's to roll (i believe due to naming and kubeconfig/auth-map.rendered dependencies).

here's an example of how our eks-worker-groups module looks:

worker_groups_launch_template = {
   "${var.env}-${var.cluster_id}-infra-${data.aws_availability_zones.available.names[0]}-20191003" = {
     instance_type                 = "m5.large"
     asg_min_size                  = 1
     asg_max_size                  = 2
     key_name                      = var.ssh_key_name
     subnets                       = [ data.terraform_remote_state.vpc.outputs.private_subnets[0] ]
     kubelet_extra_args            = "--node-labels=k8s-env=${local.full-cluster-name},nodegroup=infra --register-with-taints=infra=true:NoSchedule"
     autoscaling_enabled           = true
     root_volume_size              = 1024
     termination_policies          = [ "OldestInstance" ]
     ami_id                        = "ami-05d586e6f773f6abf"
     tags = [{
       key = "worker-type"
       value = "infra"
       propagate_at_launch = true
     }]
   },

i am planning on getting a public repo up that can at least provide a starting point if someone wants to see how i handled all the for_each's. we also made a couple other adjustments that would make submitting a PR not really possible. but really want to at least contribute the concept back.

@aeugenio
Copy link

aeugenio commented Oct 9, 2019

oh yeah forgot to mention that upgrading the cluster via the steps i outlined above works really nicely.

@dpiddockcmp
Copy link
Contributor

There's no way around it, moving to a map of maps and for_each will be painful for everyone upgrading.

I've done it to a few unrelated plans as part of our 0.12 upgrade efforts. You can't simply use terraform state mv resource[0] resource["key"] for the first item as TF gets confused. You have to either drop and import, manually edit the state file or let TF delete and recreate the resources. Not sure if TF > 0.12.7 has improved that. Need to test what has changed in 0.12.10

A lot of short term pain for all. Would make life much easier for people who change their worker_groups.

@max-rocket-internet
Copy link
Contributor

You have to either drop and import, manually edit the state file or let TF delete and recreate the resources

Hmmm, sounds painful.

When I remove group-01 it changes the indexes and forces a rebuild of group-02 and group-03.

Is it not possible to use terraform state mv resource[1] resource[0] for this?

@galindro
Copy link

galindro commented Jan 8, 2020

Unfortunately no @max-rocket-internet . Take a look:

First, I tried to just move it:

$ terraform state mv module.eks.aws_autoscaling_group.workers[1] module.eks.aws_autoscaling_group.workers[0]
Acquiring state lock. This may take a few moments...

Error: Invalid target address

Cannot move to module.eks.aws_autoscaling_group.workers[0]:
there is already a resource instance at that address in the current state.

Then, I tried to remove the resource from index 0 and try to move again:

$ terraform state rm module.sc-eks.module.eks.aws_autoscaling_group.workers[0]
Removed module.sc-eks.module.eks.aws_autoscaling_group.workers[0]
Successfully removed 1 resource instance(s).

$ terraform state mv module.eks.aws_autoscaling_group.workers[1] module.eks.aws_autoscaling_group.workers[0]
Move "module.eks.aws_autoscaling_group.workers[1]" to "module.eks.aws_autoscaling_group.workers[0]"

Error: Invalid target address

Cannot move to module.eks.aws_autoscaling_group.workers[0]:
module.eks.aws_autoscaling_group.workers does not exist in the
current state.

@galindro
Copy link

galindro commented Jan 9, 2020

I was able to workaround by moving the index 0 to index 2 and then move index 1 to 0. After that, I just ran the apply command again to destroy the old workers.

@ArchiFleKs
Copy link
Contributor

I wanted to try and work on this, I started with the worker_launch_template and iterate from here.

If someone have the time I'd like some feedback/help here

I'm having issues with the coalescelist and getting this error:

                                                                                                               
  on aws_auth.tf line 13, in data "template_file" "launch_template_worker_role_arns":                          
  13:         lookup(data.aws_iam_instance_profile.custom_worker_group_launch_template_iam_instance_profile[eac
h.key], "role_name", ""                                                                                        
    |----------------                                                                                          
    | data.aws_iam_instance_profile.custom_worker_group_launch_template_iam_instance_profile is object with no 
attributes                                                                                                     
    | each.key is "default-eu-west-3c"                                                                         
                                                                                                               
The given key does not identify an element in this collection value.                                           
                                                                     

Despite using a lookup here, it stills tried to access the attribute on an empty map. I'm stuck here, I don't know how to have the same behavior as coalescelist but with two maps.

@ArchiFleKs
Copy link
Contributor

Looking into what's been done for the node_group the best way would be to merge everything beforehand from what I understand here: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/modules/node_groups/locals.tf

@jaimehrubiks
Copy link
Contributor

What if we support both in parallel?
Add worker_groups_launch_template_map so that new users can start using the _map variable, while old users can remain on the old one and can try to transition in the future

@stale
Copy link

stale bot commented Jun 15, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jun 15, 2020
@stale
Copy link

stale bot commented Jul 15, 2020

This issue has been automatically closed because it has not had recent activity since being marked as stale.

@lexsys27
Copy link

lexsys27 commented Sep 3, 2020

Any updates to this issue?

I would like to have worker_groups as a map too. That way I can delete and add groups without rebuilding all others which indexes are higher. Also I can address them by names in worker_asg_names rather than indexes.

Several days ago I wanted to delete groups 0 and 1. After considering the plan Terraform gave me I just scaled them to 0 not to rebuild all other groups :)

@dpiddockcmp
Copy link
Contributor

To treat the worker_groups as a map would need extensive changes to the module. It will also require every user of the module to recreate all worker groups when upgrading.

@stale
Copy link

stale bot commented Dec 3, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Dec 3, 2020
@stale
Copy link

stale bot commented Jan 3, 2021

This issue has been automatically closed because it has not had recent activity since being marked as stale.

@stale stale bot closed this as completed Jan 3, 2021
@varkey
Copy link

varkey commented Jul 2, 2021

This is still a problem, can we please keep this issue open?

@sherifabdlnaby
Copy link

Stale Bump :)

@bryantbiggs bryantbiggs reopened this Sep 7, 2021
@stale stale bot removed the stale label Sep 7, 2021
@daroga0002
Copy link
Contributor

daroga0002 commented Sep 7, 2021

#858 or #1366 will solve this issue, but this potencially is breaking change for existing users and will require moving terraform state resources

@stale
Copy link

stale bot commented Oct 9, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Oct 9, 2021
@stale
Copy link

stale bot commented Oct 16, 2021

This issue has been automatically closed because it has not had recent activity since being marked as stale.

@stale stale bot closed this as completed Oct 16, 2021
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests