Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

Preserve subnets during scale up #2095

Closed

Conversation

itowlson
Copy link
Contributor

What this PR does / why we need it: When you scale up an acs-engine cluster (using az acs scale or acs-engine scale), any subnets of the cluster vnet other than the automatic k8s-master subnet are deleted. This is unexpected, and if the subnet is in use (for example because it contains a gateway or IP address) then the subnet cannot be deleted and this blocks the scaling operation.

The underlying problem is that scale up recreates the ARM template for the resource group and re-applies it. Because the ARM template includes only the original automatic subnet, ARM assumes that the desired state of the vnet is to contain only the automatic subnet, and therefore attempts to delete any other subnets.

This PR excludes the vnet from the ARM template generated for scaling, using the same techniques already used to exclude the route table and NSG.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #2063

Special notes for your reviewer:

Release note:

NONE

@feiskyer
Copy link
Member

@itowlson Is this ready for merge? It seems the change make sense

@itowlson
Copy link
Contributor Author

@feiskyer I'd value a review before merge. I think it's a safe change, and that it's the desired behaviour for all scenarios that I know of -- but others on the acs-engine team will have a better sense of possible customer networking scenarios and whether the current behaviour is intentional. Would you or @jackfrancis be a suitable reviewer perhaps?

@feiskyer
Copy link
Member

feiskyer commented Feb 1, 2018

@itowlson Thanks. LGTM in my review.

@itowlson itowlson force-pushed the preserve-subnets-during-scale-up branch 2 times, most recently from 869db8c to 119320c Compare February 8, 2018 18:47
@dmitsh
Copy link

dmitsh commented Feb 9, 2018

LGTM. Could you test upgrade scenario for 3 masters with and without VNET?

@itowlson
Copy link
Contributor Author

Upgrade seems to work, but destroys subnets (recreates vnet), so I guess we should fix it to preserve the vnet in that scenario as well. Thanks for catching that - I'd only considered the scale scenario, not upgrade. returns to the code mines

@itowlson
Copy link
Contributor Author

I've applied a fix so the upgrade command preserves the vnet, but it needs re-review because the way the upgrade path strips things out of the base templates is very different from the way the scale path does it! Can you take a look please @feiskyer @dmitsh? Thanks!

@rocketraman
Copy link
Contributor

Looking forward to this one as we use custom VNETs in our acs-engine cluster...

@jackfrancis
Copy link
Member

@itowlson and @JunSun17, could you work together on reconciling the changes in this PR and in this PR that merged recently:

93c99d1#diff-d11913d43be1f6b820fa831e9216a6cc

It looks like @JunSun17 explicitly removed the master NIC dependency stuff, while @itowlson's change in this PR refined it.

@itowlson itowlson force-pushed the preserve-subnets-during-scale-up branch from a90e822 to fef39ec Compare March 7, 2018 00:56
@jackfrancis jackfrancis force-pushed the preserve-subnets-during-scale-up branch from fef39ec to 7c1d169 Compare March 14, 2018 21:08
@ghost ghost assigned jackfrancis Mar 14, 2018
@ghost ghost added the in progress label Mar 14, 2018
@stephenlawrence
Copy link
Contributor

Version: canary
GitCommit: 7c1d169
GitTreeState: dirty

I still am seeing this issue when testing with this branch. I utilize a RedisCache in the resource group and create a redis-subnet for it (required).

"Subnet redis-subnet is in use by /subscriptions/SUBID/resourceGroups/RGNAME/providers/Microsoft.Cache/Redis/MyRedisCache and cannot be deleted"

@CecileRobertMichon
Copy link
Contributor

I think we can close this PR since #2994 was merged? @jackfrancis

@jackfrancis
Copy link
Member

I'd defer to @itowlson, who would probably be happy to discover that (if?) we've fixed this 5 months later ;)

@itowlson
Copy link
Contributor Author

@CecileRobertMichon @jackfrancis Yep, #2994 looks pretty similar to this, except that I don't think it addresses the upgrade scenario (unless that code path has changed) which was requested on this PR. I agree we should close this but perhaps we should keep upgrade on the radar, as again we don't want e.g. Redis or App Gateway subnets to block an upgrade. Other than that, YAY!

@itowlson itowlson closed this May 29, 2018
@ghost ghost removed the in progress label May 29, 2018
@jackfrancis
Copy link
Member

Thanks for checking in @itowlson, I agree of course that upgrade/scale should share the same underlying functional assumptions

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scale up deletes subnets
7 participants