Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azurerm_kubernetes_cluster: Adding node pools causes AKS cluster replacement #3971

Closed
lewalt000 opened this issue Jul 30, 2019 · 8 comments · Fixed by #4898
Closed

azurerm_kubernetes_cluster: Adding node pools causes AKS cluster replacement #3971

lewalt000 opened this issue Jul 30, 2019 · 8 comments · Fixed by #4898

Comments

@lewalt000
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

$ terraform -v
Terraform v0.12.5
+ provider.azurerm v1.32.0

Affected Resource(s)

azurerm_kubernetes_cluster

Terraform Configuration Files

resource "azurerm_kubernetes_cluster" "test" {
  name                = "test-k8s"
  resource_group_name = "test-rg"
  location            = "westus"
  dns_prefix          = "k8s"

  agent_pool_profile {
    name            = "poolone"
    count           = 1
    vm_size         = "Standard_B2ms"
    os_type         = "Linux"
    os_disk_size_gb = 30
  }

  /*
  agent_pool_profile {
    name            = "pooltwo"
    count           = 1
    vm_size         = "Standard_B2ms"
    os_type         = "Linux"
    os_disk_size_gb = 30
  }
  */

  network_profile {
    network_plugin     = "azure"
    network_policy     = "azure"
    service_cidr       = "172.0.0.0/24"
    dns_service_ip     = "172.0.0.10"
    docker_bridge_cidr = "172.17.0.1/16"
  }

  service_principal {
    client_id     = $MY_SP_ID
    client_secret = $MY_SP_SECRET
  }
}

Expected Behavior

A node pool should have been added to the existing AKS cluster without needing to destroy it first.

Actual Behavior

The entire AKS cluster is destroyed and recreated with the additional node pool

Steps to Reproduce

  1. Setup Terraform config for an AKS cluster with 1 node pool
  2. terraform apply
  3. Add an additional agent_pool_profile nested resource
  4. terraform apply
  5. Observe the Terraform plan wants to destroy and re-created the entire AKS cluster
  # azurerm_kubernetes_cluster.test must be replaced
-/+ resource "azurerm_kubernetes_cluster" "test" {
      - api_server_authorized_ip_ranges = [] -> null
        dns_prefix                      = "k8s"
ter apply)
      ~ kube_admin_config               = [] -> (known after apply)
      + kube_admin_config_raw           = (sensitive value)
      ~ kube_config                     = [
etc...
 ~ agent_pool_profile {
          - availability_zones  = [] -> null
            count               = 1
          + dns_prefix          = (known after apply)
          - enable_auto_scaling = false -> null
          ~ fqdn                = "k8s-1701e02c.hcp.westus.azmk8s.io" -> (known after apply)
          - max_count           = 0 -> null
          ~ max_pods            = 30 -> (known after apply)
          - min_count           = 0 -> null
            name                = "poolone"
          - node_taints         = [] -> null
            os_disk_size_gb     = 30
            os_type             = "Linux"
            type                = "AvailabilitySet"
            vm_size             = "Standard_B2ms"
        }
      + agent_pool_profile {
          + count           = 1
          + dns_prefix      = (known after apply)
          + fqdn            = (known after apply)
          + max_pods        = (known after apply)
          + name            = "pooltwo" # forces replacement
          + os_disk_size_gb = 30 # forces replacement
          + os_type         = "Linux" # forces replacement
          + type            = "AvailabilitySet" # forces replacement
          + vm_size         = "Standard_B2ms" # forces replacement
        }

Important Factoids

N/A

References

This issue looks related (Terraform replacing AKS nodepool cluster when changing VM count)
#3835

@lewalt000 lewalt000 changed the title azurerm_kubernetes_cluster: Modifying node pools causes AKS cluster replacement azurerm_kubernetes_cluster: Adding node pools causes AKS cluster replacement Jul 30, 2019
@djsly
Copy link
Contributor

djsly commented Aug 1, 2019

@titilambert can you add this to your list ?

@sagivle
Copy link

sagivle commented Aug 1, 2019

when can we expect this to be fixed?

@Ant59
Copy link

Ant59 commented Aug 9, 2019

When is this likely to have a fix? It's the only thing in the way of Terraforming hybrid Linux/Windows clusters at the moment because it fails when trying to create the two nodepools during cluster creation. I can workaround it by creating the ARM template for the Windows nodepool directly, but then Terraform wants to re-create the cluster every time because it doesn't know about that nodepool.

@jluk
Copy link

jluk commented Aug 9, 2019

I think altogether these problems are interweaved in issue #4001 as well. There are multiple issues in terraform at the moment it would help if we can consolidate questions there.

The AKS team is aware of these issues and while we work through the main feature we will try to provide proper guidance for TF as well across all of these.

@pruneau628
Copy link

pruneau628 commented Aug 16, 2019

In fact, if you use nodepools, even retrying an apply with the very same terraform configuration might end up trigerring a re-creation of the aks cluster.

We can reproduce this very easily by creating an aks cluster with 3 agent_pool_profile, letting the creation succeeds, and immediately re-do terraform apply.

If you are unlucky, the order in which azurerm retrieve the agent_pool_profile objects does not match you terraform source code, then it triggers a re-creation, because all objects are "modified".

This happens because the agent_pool_profile order in the terraform state is always alphabetically sorted on the names.

You can avoid this bug by changing the order in your terraform source code, but it's not intuitive at all, and is still a bug from our point of view.

Versions used:

terraform -v
Terraform v0.12.6
+ provider.azurerm v1.32.1

@admtomgla
Copy link

Can i ask humbly for some update of status or roadmap, when we could expect that feautre works correctly ? Thank You in advance,

@ghost
Copy link

ghost commented Nov 26, 2019

This has been released in version 1.37.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 1.37.0"
}
# ... other configuration ...

@ghost
Copy link

ghost commented Mar 29, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Mar 29, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.