Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM template regression, 2020-11-01 forward requires node pool count property #1627

Closed
ondrejhlavacek opened this issue May 23, 2020 · 11 comments

Comments

@ondrejhlavacek
Copy link

It seems that a recent update made the count node pool property required. My last ARM template used

        "enableAutoScaling": true,
        "minCount": "1",
        "maxCount": "5",

and didn't require the count property.

Now it seems the count prop is required, the deploy will otherwise fail. Furthermore it seems that it's value has to be the current node count, which is hard to get considering the node pool may just be scaling up/down.

Is this a bug or a feature? How do the count, minCount and maxCount work together?

@neumanndaniel
Copy link

Looking at the ARM template reference the change was introduced with API version 2019-11-01.

-> https://docs.microsoft.com/en-us/azure/templates/microsoft.containerservice/2019-11-01/managedclusters#ManagedClusterAgentPoolProfile

@jluk jluk changed the title Node pool autoscaling vs count in ARM ARM template regression, 2020-11-01 forward requires node pool count property May 29, 2020
@jluk jluk self-assigned this May 29, 2020
@jluk jluk added the bug label May 29, 2020
@jluk
Copy link
Contributor

jluk commented May 29, 2020

Thanks for the report, this is a regression on the changes @neumanndaniel posted. The current fastest mitigation is to use 2019-10-01.

We have the fix merged and pending release.
Azure/azure-rest-api-specs#9610

Once this fix has been released we will post back here.

@marwanad
Copy link

@neumanndaniel @onderyildirim The fix should be out for the affected API versions. Can you try with 2019-11-01 or 2020-03-01 and report back? The link above should be updated soon.

@ondrejhlavacek
Copy link
Author

I have tried 2020-03-01, removed the count attribute from the template and received this beauty

The AgentPoolProfile 'default' has an invalid total maxPods(maxPods per node * node count), the total maxPods(50 * 0) should be larger than 30. Please refer to aka.ms/aks-min-max-pod for more detail.

@marwanad
Copy link

marwanad commented May 31, 2020

@ondrejhlavacek this is for the create path correct? You're attempting to create a new managed cluster?

If you're using API 2020-03-01 or higher, during create time you can only set an initial count of 0 for non-system pools. For updates, you should be fine to drop the count field.

If you're using lower API versions, you can't create 0 count nodepools. Updates you should be fine with dropping the field.

The changes basically revert the behaviour to the same one as 2019-10-01.

@marwanad
Copy link

marwanad commented May 31, 2020

I tested with the following body (on an existing MC):

 "resources": [
        {
            "type": "Microsoft.ContainerService/managedClusters",
            "apiVersion": "2020-03-01",
            "name": "[parameters('clusterName')]",
            "location": "[parameters('location')]",
            "properties": {
                "agentPoolProfiles": [
                    {
                        "name": "agentpool",
                        "enableAutoScaling": true,
                        "minCount": 1,
                        "maxCount": 15
                    }
                ]
            }
        }
    ]

@ondrejhlavacek
Copy link
Author

@marwanad Thanks for the questions, i'll try to clarify. I was updating an existing cluster, namely a system node pool. As a workaround we had a script populating the count property with the actual node count and I just omitted the count property keeping all other props untouched (minCount: 1 and maxCount: 2). Version 2020-03-01 and we use the same template for create and update.

    {
      "apiVersion": "2020-03-01",
      "type": "Microsoft.ContainerService/managedClusters",
      "location": "[resourceGroup().location]",
      "name": "[variables('clusterName')]",
      "identity": {
        "type": "SystemAssigned"
      },
      "properties": {
        "kubernetesVersion": "[parameters('kubernetesVersion')]",
        "dnsPrefix": "[variables('clusterName')]",
        "agentPoolProfiles": [
          {
            "name": "default",
            "osDiskSizeGB": "[parameters('defaultNodePoolOSDiskSizeGB')]",
            "vmSize": "[parameters('defaultNodePoolVMSize')]",
            "osType": "Linux",
            "storageProfile": "ManagedDisks",
            "maxPods": 50,
            "type": "VirtualMachineScaleSets",
            "enableNodePublicIP": false,
            "vnetSubnetID": "[parameters('defaultSubnetID')]",
            "enableAutoScaling": true,
            "minCount": "[parameters('defaultNodePoolMinCount')]",
            "maxCount": "[parameters('defaultNodePoolMaxCount')]",
            "mode": "System"
          }
        ],
        "addonProfiles": {
          "httpapplicationrouting": {
            "enabled": false,
            "config": {}
          }
        },
        "enableRBAC": true,
        "networkProfile": {
          "networkPlugin": "azure",
          "loadBalancerSku": "Standard",
          "serviceCidr": "10.1.0.0/16",
          "dnsServiceIP": "10.1.0.10",
          "dockerBridgeCidr": "172.17.0.1/16"
        }
      }
    }

@marwanad
Copy link

marwanad commented Jun 1, 2020

@ondrejhlavacek hmm I'm unable to repro this on my side. I tried with system pools and a similar template. Can you file a support ticket with the info?

Do you face the same issue with non-System pools?

@jluk
Copy link
Contributor

jluk commented Jun 5, 2020

Closing this as the fix should be released, but if people still hit this problem we will revisit/reopen.

@jluk jluk closed this as completed Jun 5, 2020
@dennis-yemelyanov
Copy link

Is there any documentation explaining how count, minCount and maxCount work together? What will happen if all 3 properties are sent in an update call for an existing cluster, and enableAutoScaling is set to true? Will the count property be ignored?

@marwanad
Copy link

@dennis-yemelyanov correct, the behaviour for templates if you have CA on and you change the count property is to ignore the input. This is because with autoscaler on, the count property deviates from the initial one in the template when autoscaler is on and breaks some validations on the current state.

@ghost ghost locked as resolved and limited conversation to collaborators Jul 22, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants