Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes cluster must be replaced because of private_cluster_public_fqdn_enabled #13099

Closed
IndependerGerard opened this issue Aug 23, 2021 · 8 comments · Fixed by #13413
Closed

Comments

@IndependerGerard
Copy link

IndependerGerard commented Aug 23, 2021

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

Terraform - v1.0.5
azurerm - v2.73.0

Affected Resource(s)

  • azurerm_kubernetes_cluster

Terraform Configuration Files

terraform {
  required_providers {
    azurerm = {
      source = "hashicorp/azurerm"
      version = "2.73.0"
    }
  }
}

provider "azurerm" {
  features {}
}

locals {
  aks_tags_dev = {
    Owner             = "__owner__"
    CreatedBy         = "__createdby__"
    EnvironmentType   = "dev"
    CostCenter        = "4600-9000"
    Website           = "__website__"
  }
}

resource "azurerm_kubernetes_cluster" "aks-web-dev-001" {
  name                                  = "aks-web-dev-001"
  location                              = "__location__"
  resource_group_name                   = "rg-web-dev-001"
  dns_prefix                            = "aks-web-dev-001-dns"
  kubernetes_version                    = "1.19.11"
  node_resource_group                   = "rg-web-dev-nodes-001"

  tags = local.aks_tags_dev

  identity {
    type             = "SystemAssigned"
  }

  default_node_pool {
    name                          = "sysb4ms001"
    vm_size                       = "Standard_B4ms"
    node_count                    = 1
    max_pods                      = 60
    only_critical_addons_enabled  = true
    orchestrator_version          = "1.19.11"
    vnet_subnet_id                = "__aksSubnetId__"

    tags = local.aks_tags_dev
  }

  linux_profile {
    admin_username = "azureuser"

    ssh_key {
      key_data = "__aksSshKey__"
    }
  }

  network_profile {
    dns_service_ip        = "10.201.0.10"
    docker_bridge_cidr    = "172.201.0.1/16"
    network_plugin        = "azure"
    network_policy        = "calico"
    service_cidr          = "10.201.0.0/16"
  }

  role_based_access_control {
    enabled = true
  }
}

Debug Output

Debug contains secrets, so would prefer a private method if necessary for sharing.

Expected Behaviour

The Costcenter tag gets replaced by the CostCenter tag without causing downtime on the cluster.

Output should be something like:

Terraform will perform the following actions:

  # azurerm_kubernetes_cluster.aks-web-dev-001 will be updated in-place
  ~ resource "azurerm_kubernetes_cluster" "aks-web-dev-001" {
    ...
      ~ tags                       = {
          + "CostCenter"      = "4600-9000"
          - "Costcenter"      = "4600-9000" -> null
            # (3 unchanged elements hidden)
        }
    }

Actual Behaviour

The whole cluster gets replaced.

Actual output is something like:

Terraform will perform the following actions:

  # azurerm_kubernetes_cluster.aks-web-dev-001 must be replaced
-/+ resource "azurerm_kubernetes_cluster" "aks-web-dev-001" {
      + private_cluster_public_fqdn_enabled = false # forces replacement
    }

The same configuration works as expected on azurerm 2.72.0

Steps to Reproduce

  1. terraform plan
@apeschel
Copy link

apeschel commented Sep 8, 2021

Here's the source of the problem:

I think this could be fixed by simply removing the default value here?

apeschel added a commit to apeschel/terraform-provider-azurerm that referenced this issue Sep 8, 2021
hashicorp#13099

This default value causes clusters to be rebuilt on existing
deployments. Simply removing the default value should be sufficient for
preventing the rebuilds from occuring.

If someone needs to explicitly set this to false for some reason, they
can still do that manually. This is a preferable situation to existing
users not being able to upgrade without a complete cluster rebuild.
hieumoscow added a commit to hieumoscow/terraform-provider-azurerm that referenced this issue Sep 20, 2021
Fix hashicorp#13099, to do in place update for `private_cluster_public_fqdn_enabled`
@LaurentLesle
Copy link
Contributor

@apeschel I think the default must be kept to false. I agree the forceNew must set to false.
That will match the az cli behavior
az aks update -g $rg -n $CLUSTER --disable-public-fqdn

Now we have found all private aks clusters created with versions up to 2.72.0 have this feature enabled in the background. When upgrading to azurerm 2.73+ it is causing the cluster to be recreated.

image

At least if the forceNew is set to false that will only trigger an update of the existing cluster and set this value to false as per the default setting.

The provider documentation says "This requires that the Preview Feature Microsoft.ContainerService/EnablePrivateClusterPublicFQDN is enabled and the Resource Provider is re-registered". In our case the preview feature was not enabled and we observed the behavior described in this issue.

@arnaudlh
Copy link

Please fix as it impacting our customers in APAC!

@apeschel
Copy link

apeschel commented Sep 21, 2021

@LaurentLesle I had assumed ForceNew was set to true, because if the value is toggled from false to true, or vice versa, a cluster rebuild would be required. I might be mistaken though.

If this option is something that does require a cluster rebuild though, the the right solution would be to keep ForceNew = true and simply remove the default value, or maybe some other method.

@LaurentLesle
Copy link
Contributor

@apeschel I took the az cli as the base reference to confirm this is an in-place upgrade of the aks cluster. So ForceNew should be set to false
az aks update -g $rg -n $CLUSTER --disable-public-fqdn

@apeschel
Copy link

@apeschel I took the az cli as the base reference to confirm this is an in-place upgrade of the aks cluster. So ForceNew should be set to false
az aks update -g $rg -n $CLUSTER --disable-public-fqdn

Yes, your logic here makes sense, but you're not addressing the much more likely scenario: what if you are toggling the value from false to true, or from true to false?

ForceNew indicates that any change in this field requires the resource to be destroyed and recreated.

katbyte pushed a commit that referenced this issue Sep 23, 2021
…o longer force new (#13413)

Fix #13099, to do in place update for private_cluster_public_fqdn_enabled
@IndependerGerard
Copy link
Author

I can confirm that the issue on our side is fixed with azurerm - v2.78.0. Thanks to everyone involved in fixing this issue.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
5 participants