-
Notifications
You must be signed in to change notification settings - Fork 558
Upgrade results in node with 111 IPs #2668
Comments
@jackfrancis I filed this to track the issue found testing #2650 |
@EPinci will try to repro |
Running a deployment, then a series of upgrades against this api model:
After initial deployment:
|
Holding steady after the 1st upgrade (from 1.7.0 to 1.7.12):
|
Weird I tried this multiple times and always with the same result (even the IP count!). Can you try just deleting one of the two nodes to see if this has an impact: this is exactly my scenario where the original node count get changed outside ACS-Engine (e.g.: autoscaler)? Do you want me to send you my api model? |
Let's let my test keep running (there are 12 more upgrades to go). I'm not saying for sure that we can't repro yet. :) |
I take it back, I've been unable to repro. Yeah, please paste in the api model you're seeing this behavior on post-upgrade, and we'll repro using it as exactly as possible. Thanks! |
Ok, since I don't know what is actually relevant, this is the entire process I'm using to replicate upgrading my production. On an Empty RG, deploy a local VNet (nothing fancy, just three /24):
Compile the following apimodel with ACS-Engine 13.1 (not sure if binary version is relevant but this is the same as my current production cluster): {
"apiVersion": "vlabs",
"properties": {
"orchestratorProfile": {
"orchestratorType": "Kubernetes",
"orchestratorRelease": "1.9",
"kubernetesConfig": {
"addons": [
{
"name": "tiller",
"enabled" : false
}
]
}
},
"aadProfile": {
"serverAppID": "<<REMOVED>>",
"clientAppID": "<<REMOVED>>",
"tenantID": "<<REMOVED>>",
"adminGroupID": "<<REMOVED>>"
},
"masterProfile": {
"count": 3,
"dnsPrefix": "cluster-dev",
"vmSize": "Standard_A1_v2",
"storageProfile" : "ManagedDisks",
"OSDiskSizeGB": 128,
"firstConsecutiveStaticIP": "10.24.250.230",
"ipAddressCount": 20,
"vnetCidr": "10.24.0.0/16",
"vnetSubnetId": "/subscriptions/<<REMOVED>>/resourceGroups/<<REMOVED>>/providers/Microsoft.Network/virtualNetworks/K8sVNet/subnets/master"
},
"agentPoolProfiles": [
{
"name": "nodepool1",
"count": 3,
"vmSize": "Standard_A2_v2",
"storageProfile" : "ManagedDisks",
"OSDiskSizeGB": 128,
"availabilityProfile": "AvailabilitySet",
"vnetSubnetId": "/subscriptions/<<REMOVED>>/resourceGroups/<<REMOVED>>/providers/Microsoft.Network/virtualNetworks/K8sVNet/subnets/frontend"
}
],
"linuxProfile": {
"adminUsername": "clusteradm",
"ssh": {
"publicKeys": [
{
"keyData": "<<REMOVED>>"
}
]
}
},
"servicePrincipalProfile": {
"clientId": "<<REMOVED>>",
"secret": "<<REMOVED>>"
}
}
} Then I deploy it:
This results in a three masters and three agents 1.9.3 cluster. I then delete from the Azure portal the last two agents to simulate a non ACS-Engine aware cluster node count change such as what results from a cluster autoscaler. Run the upgrade with the current ACS-Engine:
Upgrade will delete the first master VM and redeploy it. I can run a custom ACS-Engine build from HEAD with the small patch from #2061 and the upgrade continues with master 2 but then fails on master 3 with subnet full (3 x 111 = more than a /24 subnet). Thank you. |
@jackfrancis Any chance you can give this a go? What do you think about it? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contribution. Note that acs-engine is deprecated--see https://github.com/Azure/aks-engine instead. |
Is this a request for help?: Yes
Is this an ISSUE or FEATURE REQUEST? ISSUE
What version of acs-engine?: v15.1
Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm) from v1.9.3 to v1.9.6
What happened:
I configured the cluster with AzureCNI and ipAddressCount set to 20.
During an upgrade run node get thorn down and rebuilt.
Original node had 20 IPs as expected, new node has 111 IPs resulting in quick subnet exhaustion.
What you expected to happen:
Original node having 20 IP, new node having the same number.
How to reproduce it (as minimally and precisely as possible):
Deploy a cluster with with AzureCNI and ipAddressCount set to 20.
Mine had 3 masters (with 20 IPs as well) and 3 node (with standard 30 IPs).
Anything else we need to know:
The text was updated successfully, but these errors were encountered: