Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Kubernetes/Orchestrator Version is not available for Node Pool" even though the node pool is controlled by that orchestator #16173

Closed
lui2131 opened this issue Mar 31, 2022 · 3 comments

Comments

@lui2131
Copy link

lui2131 commented Mar 31, 2022

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

`$ terraform -v
Terraform v1.1.7
on linux_amd64

  • provider registry.terraform.io/hashicorp/azurerm v3.0.2
  • provider registry.terraform.io/hashicorp/local v2.2.2
  • provider registry.terraform.io/hashicorp/tls v3.1.0
    `

Affected Resource(s)

  • azurerm_kubernetes_cluster

Terraform Configuration Files

terraform {
  required_providers {
    azurerm = {
      source = "hashicorp/azurerm"
      version = ">= 3.0.2"
    }
    local = {
      source = "hashicorp/local"
      version = ">= 2.2.2"
    }
    tls = {
      source = "hashicorp/tls"
      version = ">= 3.1.0"
    }
  }
}

resource "azurerm_kubernetes_cluster" "aks-cluster" {
  name                = local.cluster_name
  location            = module.globals.location
  resource_group_name = module.globals.resource_group_name
  dns_prefix          = local.dns_prefix
  private_cluster_enabled = false

  role_based_access_control_enabled = true

  azure_active_directory_role_based_access_control {
      managed = true
      admin_group_object_ids = [
        var.aad_group_id
    ]
  }

  network_profile {
    network_plugin = "kubenet"
    network_policy = var.network_policy
    load_balancer_sku = var.load_balancer_sku
    pod_cidr = var.pod_cidr
  }


  service_principal {
    client_id = var.cluster_service_principal_client_id
    client_secret = data.azurerm_key_vault_secret.service_principal_data.value
  }

  default_node_pool {
    name                  = var.cluster_nodepool_name
    type                  = "VirtualMachineScaleSets"
    node_count            = 5
    vm_size               = "Standard_DS2_v2"
    os_disk_size_gb       = 50
    vnet_subnet_id        = var.nodepool_subnet_id
    enable_auto_scaling   = false
    enable_node_public_ip = false
    orchestrator_version = "1.20.9"

    node_labels = {
      aks_node_group = "${module.globals.application_name}-node"
    }

    tags = {
      environment = module.globals.tag_environment
      owner = module.globals.tag_owner
      created_by = module.globals.tag_created_by
    }
  }

  tags = {
    environment = module.globals.tag_environment
    owner = module.globals.tag_owner
    created_by = module.globals.tag_created_by
  }
}

Debug Output

I set the TF_LOG environment variable to JSON and then to TRACE but still didn't see any additional output. Here is the resulting stderr trace:

<script src="https://gist.github.com/lui2131/f6505bac93759f7ccbf42074868cf1b1.js"></script>

Panic Output

No panic output was observed.

Expected Behaviour

Trying to increase the size of (scale) an aks cluster using Terraform from 4 to 5 nodes.

Actual Behaviour

Terraform cli threw an error. The main error output is captured in the above log file but a short version of the error output is:

The Kubernetes/Orchestrator Version "1.20.9" is not available for Node Pool "<node-pool-name>".

Despite the fact that the node pool does indeed have version 1.20.9 of kubernetes running which is confirmed by both looking at the Azure Portal UI as well as the JSON resource for the cluster I am trying to scale.

Steps to Reproduce

  1. terraform apply

Important Factoids

The cluster was created originally using the following Terraform providers:

terraform {
  required_providers {
    azurerm = {
      source = "hashicorp/azurerm"
      version = ">= 2.51"
    }
    local = {
      source = "hashicorp/local"
      version = ">= 2.1.0"
    }
    tls = {
      source = "hashicorp/tls"
      version = ">= 3.1.0"
    }
  }
}

This cluster was originally created a while ago on a version of Kubernetes that I'm not entirely sure of sadly.

I've also tried manually scaling the vmss nodepool (through the Azure Portal UI for the actual AKS cluster), and was unable to successfully scale the VMSS. For some reason the node that was being created was not being added to the routing table that all the other nodes are a part of. This is important because without those routes the kubenet networking plugin fails and pods running on the new node cannot connect directly to the pod running on older nodes leading to a headache of a debugging problem.

I was hoping that using Terraform to scale the AKS cluster would fix the routing table problem however it seems I'm unable to even apply the current Terraform manifest. If I can't somehow scale the existing cluster manually I'm probably also going to be reaching out directly to Microsoft support.

References

  • #0000

EDIT: Moving formatting around to fit community guidlines

@stephybun
Copy link
Member

Thanks for taking the time to raise this issue @lui2131!

There is however already an open issue regarding this behaviour, so I am going to consolidate this into #8147. Please subscribe to #8147 for updates.

Thanks!

@lui2131
Copy link
Author

lui2131 commented Mar 31, 2022

Hi @stephybun thanks for taking the time to read over the issue. I will say that I agree this issue is closely related to the other one that is open however the error being shown is different which is why I wanted to capture it in this issue. The full error (which I saved to a gist) is shown below:

╷
│ Error: 
│ The Kubernetes/Orchestrator Version "1.20.9" is not available for Node Pool "<node-pool-name>".
│ 
│ Please confirm that this version is supported by the Kubernetes Cluster "<aks-cluster-name>"
│ (Resource Group "<Resource-Group-Name>") - which may need to be upgraded first.
│ 
│ The Kubernetes Cluster is running version "1.20.9".
│ 
│ The supported Orchestrator Versions for this Node Pool/supported by this Kubernetes Cluster are:
│ 
│ 
│ Node Pools cannot use a version of Kubernetes that is not supported on the Control Plane. More
│ details can be found at https://aka.ms/version-skew-policy.
│ 
│ 
│   with module.cluster.azurerm_kubernetes_cluster.<aks-cluster-name>,
│   on modules/cluster/main.tf line 42, in resource "azurerm_kubernetes_cluster" "<aks-cluster-name>":
│   42: resource "azurerm_kubernetes_cluster" "<aks-cluster-name>" {
│ 
╵
Releasing state lock. This may take a few moments...

The main difference between issue #8147 and this one is that there is no orchestrator versions listed as being supported by the Kubernetes cluster even though I confirmed the nodepool does indeed support version 1.20.9.

EDIT: Didn't finish typing

@github-actions
Copy link

github-actions bot commented May 1, 2022

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants