az aks scale attempting to delete subnets #349

CMHR-MichaelLanglois · 2018-05-07T14:55:21Z

We created an AKS instance with a node count of 1. We then scaled up to a node count of 2. We had deployed a service containing a single pod with a replica count of 3, set to auto-scale up to 10 pods. We have recently attempted to implement some network security devices, including a Azure Application Firewall, and a Palo Alto firewall device.

Because we did not have ability to modify the virtual network, we added several subnets to the virtual network that was provisioned as part of the MC_{RESOURCE_GROUP}{CLUSTER_NAME}{REGION} that was created when the AKS object was created.

We reconfigured our service to use an internal load balancer, and then we included several routing rules to ensure that any traffic to our service was routed through both firewall devices before reaching the internal load balancer. This was accomplished by defining several network routes within the AKS-agentpool routetable.

With this configuration in place, we noticed that any attempts to access our service would fail roughly 1/3rd of the time. I suspected that this was an issue with our routing, and rather than attempting to solve that issue, we decided to scaled down our nodepool from 2 nodes to 1. This was achieved by running the command "az aks scale" with the appropriate flags. This completed successfully, and we resumed troubleshooting our deployment.

Once we had confirmed that our deployment was functioning, and that all traffic was being routed through our two firewalls, we then attempted to scale up to 2 nodes. This resulted in an error:

Deployment failed. Correlation ID: [correlationId]. Subnet [subnet] is in use by [firewall network interface] and cannot be deleted.

This results in the AKS resource reporting a node size of 2, and being in a failed state. We can then scale back down to a single node, which completes successfully and removes the failed state message.

I can reproduce this in both CentralUS and CanadaEast.

Is there anyway to scale the node count up without removing the vnet?

JackQuincy · 2018-05-10T16:51:21Z

Today no. The way we are scaling requires us to give a list of all subnets and we just generate all the subnets we made at the start. This is unsupported today. We might be able to make a change to support this I'm talking with the team about desire for this.

jluk · 2019-04-03T19:01:22Z

This looks to be resolved now given we have moved to aks-engine and may changes have followed, closing as a result. @CMHR-MichaelLanglois please open a new ticket if you still see scale operations try to delete networking objects.

JackQuincy mentioned this issue May 17, 2018

removing the vnet from scale templates Azure/acs-engine#2994

Merged

3 tasks

jluk closed this as completed Apr 3, 2019

ghost locked as resolved and limited conversation to collaborators Aug 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

az aks scale attempting to delete subnets #349

az aks scale attempting to delete subnets #349

CMHR-MichaelLanglois commented May 7, 2018

JackQuincy commented May 10, 2018

jluk commented Apr 3, 2019

az aks scale attempting to delete subnets #349

az aks scale attempting to delete subnets #349

Comments

CMHR-MichaelLanglois commented May 7, 2018

JackQuincy commented May 10, 2018

jluk commented Apr 3, 2019