-
Notifications
You must be signed in to change notification settings - Fork 558
adding 'cordon and drain' for agent pool upgrades #1776
Conversation
dmitsh
commented
Nov 15, 2017
- adding 'cordon and drain' for agent pool upgrades
- creating a new agent node prior to starting deleting and recreating existing agent nodes
- waiting for master nodes to become ready after upgrade prior to proceeding with agent pool upgrades
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple thoughts
} | ||
// Create an auxiliary node if hasn't been created yet. | ||
// This node is used to take on the load from deleting nodes | ||
addedNode := false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you missed my point on the last PR. We shouldn't need this bool. we should be able to just use the current count and desired count to figure this one out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Addressed. Thanks!
// In a normal mode of operation the actual number of VMs in the pool [len(upgradeVMs)] is equal to agentCount. | ||
// However, if the upgrade failed in the middle, the actual number of VMs might be agentCount+1 to include auxiliary node. | ||
// In the later case we don't create an extra node. | ||
if agentCount > 0 && len(upgradeVMs) == agentCount { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure we really need to be concerned about being asked to upgrade 0 node agent pools, but I guess it doesn't hurt.
// do not create last node in favor of auxiliary node | ||
if addedNode && upgradedCount == toBeUpgraded-1 { | ||
ku.logger.Infof("Skipping creation of VM %s (indx %d) in favor of auxiliary node", vm.Name, agentIndex) | ||
delete(upgradeVMs, agentIndex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be careful with this. Concurrent modification while looping through a map makes it very easy to skip nodes etc. Though in this case it seems like you only do it once you've determined you have already upgraded all the nodes you need to upgrade
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleting map element while looping through it is safe.
ku.logger.Infof("Expected agent count in the pool: %d, Creating %d more agents\n", agentCount, agentsToCreate) | ||
|
||
// NOTE: this is NOT completely idempotent because it assumes that | ||
// the OS disk has been deleted | ||
for i := 0; i < agentsToCreate; i++ { | ||
agentIndexToCreate := 0 | ||
for upgradedAgentsIndex[agentIndexToCreate] == true { | ||
for upgradeVMs[agentIndexToCreate].Upgraded == true { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
part of me wonders if we have a bug here. Where if we are short vms we might not have a vm at an index. which would cause an index/key not found error and we aren't doing the safe way to check the map. But I don't think this is going to change with this PR. @amanohar agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should be merged its written now and corner cases could easily result in error. Plus see my question above. I am not sure if adding upgradeVMs adds value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one question for @amanohar I'd like answered though
ch := make(chan struct{}, 1) | ||
go func() { | ||
for { | ||
agentNode, err := client.GetNode(*vmName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename the var
|
||
var masterURL string | ||
if kmn.UpgradeContainerService.Properties.HostedMasterProfile != nil { | ||
masterURL = kmn.UpgradeContainerService.Properties.HostedMasterProfile.FQDN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is hostedmasterprofile then there is no node to validate or this code path won't even be it. Can you validate?
@@ -19,6 +19,12 @@ type Upgrader struct { | |||
kubeConfig string | |||
} | |||
|
|||
// UpgradeVM holds VM name and upgrade status | |||
type UpgradeVM struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not clear about the utility of this extra struct. Name can already be derived from: upgradedAgentsIndex
. If you want change it I would recommend replacing upgradedAgentsIndex
with this struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or to put it another way what is the utility of using a combination of vm name and index to derive information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need an array of all VMs, both upgraded or to be upgraded. UpgradeVM serves an element in that array. Index is used to create/delete the VMs. Let's discuss offline if the purpose in not clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note the change to naming vms makes it slightly harder to generate vm name based on index, but we must have the data since we are making templates to deploy new vms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments added. PTAL.
added reconstruction of VM names
if app.Name == *agentPool.Name { | ||
agentCount = app.Count | ||
agentOsType = app.OSType | ||
agentPoolName = app.Name | ||
agentPoolIndex = indx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vmName, err := armhelpers.GetK8sVMName(agentOsType, ku.DataModel.Properties.HostedMasterProfile != nil,
ku.NameSuffix, agentPoolName, agentPoolIndex, agentIndex)