Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

az aks scale attempting to delete subnets #349

Closed
CMHR-MichaelLanglois opened this issue May 7, 2018 · 2 comments
Closed

az aks scale attempting to delete subnets #349

CMHR-MichaelLanglois opened this issue May 7, 2018 · 2 comments

Comments

@CMHR-MichaelLanglois
Copy link

We created an AKS instance with a node count of 1. We then scaled up to a node count of 2. We had deployed a service containing a single pod with a replica count of 3, set to auto-scale up to 10 pods. We have recently attempted to implement some network security devices, including a Azure Application Firewall, and a Palo Alto firewall device.

Because we did not have ability to modify the virtual network, we added several subnets to the virtual network that was provisioned as part of the MC_{RESOURCE_GROUP}{CLUSTER_NAME}{REGION} that was created when the AKS object was created.

We reconfigured our service to use an internal load balancer, and then we included several routing rules to ensure that any traffic to our service was routed through both firewall devices before reaching the internal load balancer. This was accomplished by defining several network routes within the AKS-agentpool routetable.

With this configuration in place, we noticed that any attempts to access our service would fail roughly 1/3rd of the time. I suspected that this was an issue with our routing, and rather than attempting to solve that issue, we decided to scaled down our nodepool from 2 nodes to 1. This was achieved by running the command "az aks scale" with the appropriate flags. This completed successfully, and we resumed troubleshooting our deployment.

Once we had confirmed that our deployment was functioning, and that all traffic was being routed through our two firewalls, we then attempted to scale up to 2 nodes. This resulted in an error:

Deployment failed. Correlation ID: [correlationId]. Subnet [subnet] is in use by [firewall network interface] and cannot be deleted.

This results in the AKS resource reporting a node size of 2, and being in a failed state. We can then scale back down to a single node, which completes successfully and removes the failed state message.

I can reproduce this in both CentralUS and CanadaEast.

Is there anyway to scale the node count up without removing the vnet?

@JackQuincy
Copy link

Today no. The way we are scaling requires us to give a list of all subnets and we just generate all the subnets we made at the start. This is unsupported today. We might be able to make a change to support this I'm talking with the team about desire for this.

@jluk
Copy link
Contributor

jluk commented Apr 3, 2019

This looks to be resolved now given we have moved to aks-engine and may changes have followed, closing as a result. @CMHR-MichaelLanglois please open a new ticket if you still see scale operations try to delete networking objects.

@jluk jluk closed this as completed Apr 3, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Aug 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants