Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

fix: persist VMSS name in API model #4051

Merged
merged 11 commits into from
Nov 20, 2020

Conversation

jackfrancis
Copy link
Member

@jackfrancis jackfrancis commented Nov 18, 2020

Reason for Change:

This PR adds a new VMSSName property to the AgentPoolProfile type struct, to persist the definitive VMSS name (when the node pool is VMSS) for future cluster operations.

The VMSS name is a computed property that depends in part upon the particular index value of the pool in the API model agentPoolProfiles array at the time of cluster creation; but over time that order may change (for example a pool may be removed, introducing a "hole" in the sequence, and effectively decrementing each pool's index by one). This solution continues to rely upon this undesirable solution (let's not throw away the baby with the bath water), but it does so only one time during a cluster operation when the API model data structure is stable, and once the property is computed it is persisted for all time in the API model to match the actual VMSS "name" property, which will remain stable and unique so long as the VMSS exists in the resource group.

This also fixes a particular edge case w/ respect to aks-engine addpool and Windows node pools (the addpool operation always operates against a single AgentPoolProfile (the new one), and thus the index is always zero: which means if the cluster was created with similarly named Windows pool at that index originally, the addpool command will produce an ARM template that actually modifies that existing pool, rather than adding an entirely new one.

Issue Fixed:

Fixes #4055

Credit Where Due:

Does this change contain code from or inspired by another project?

  • No
  • Yes

If "Yes," did you notify that project's maintainers and provide attribution?

  • No
  • Yes

Requirements:

Notes:

@codecov
Copy link

codecov bot commented Nov 18, 2020

Codecov Report

Merging #4051 (7957610) into master (29c2508) will increase coverage by 0.02%.
The diff coverage is 20.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4051      +/-   ##
==========================================
+ Coverage   73.22%   73.24%   +0.02%     
==========================================
  Files         135      135              
  Lines       20640    20640              
==========================================
+ Hits        15113    15118       +5     
+ Misses       4550     4545       -5     
  Partials      977      977              
Impacted Files Coverage Δ
cmd/addpool.go 18.07% <0.00%> (-0.11%) ⬇️
cmd/scale.go 12.84% <0.00%> (+0.14%) ⬆️
cmd/update.go 23.07% <0.00%> (+0.34%) ⬆️
pkg/api/vlabs/types.go 72.53% <ø> (ø)
pkg/api/converterfromapi.go 95.66% <100.00%> (+<0.01%) ⬆️
pkg/api/convertertoapi.go 94.00% <100.00%> (+0.01%) ⬆️
pkg/api/defaults.go 93.42% <100.00%> (+0.01%) ⬆️
pkg/api/types.go 93.33% <100.00%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 29c2508...7957610. Read the comment docs.

@acs-bot acs-bot added size/L and removed size/M labels Nov 18, 2020
@jackfrancis
Copy link
Member Author

/azp run pr-e2e

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Azure Azure deleted a comment from azure-pipelines bot Nov 19, 2020
}

if vmss.Sku != nil {
sc.newDesiredAgentCount = int(*vmss.Sku.Capacity)
uc.agentPool.Count = sc.newDesiredAgentCount
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the additional break needed here and wasn't before? Looks like it will stop processing and kick you out of the for loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so basically the current implementation enumerates through every VMSS no matter what. Even after it finds the VMSS that matches the pool in the API model and updates the scale command instance to reflect the current VMSS capacity (because aks-engine update is a aks-engine scale under the hood that does not change the instance count) and updates the API model (a convenience to reflect in the apimodel.JSON what the new count is at the time we run this operation), we keep enumerating through the VMSS array if there are more.

There is always only going to be one matching VMSS, so once we find it, we should break out of the enumeration and not waste any more cycles.

So yeah, this is just us improving things since we're mucking around in this area anyways.

Hope that makes sense. (Also definitely check that my reasoning is sound!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds about right just wanted to make sure it was intentional :-)

Are there other checks in place that affirm always only going to be one matching VMSS? Is there any chance that something could go wrong and there are two which would lead to strange behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does look like scale is equivalently inefficient:

https://github.com/Azure/aks-engine/blob/release-v0.58.0/cmd/scale.go#L410

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Azure VMSS API will guarantee uniquess of VMSS names in a resource group; here's how we enforce that in aks-engine:

https://github.com/Azure/aks-engine/blob/release-v0.58.0/pkg/api/vlabs/validate.go#L516

@@ -441,6 +436,7 @@ func (sc *scaleCmd) run(cmd *cobra.Command, args []string) error {

currentNodeCount = int(*vmss.Sku.Capacity)
highestUsedIndex = 0
break
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsturtevant added break here as well to be consistent

Copy link
Member

@mboersma mboersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

👍 on the tests.

@acs-bot
Copy link

acs-bot commented Nov 20, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis, mboersma

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [jackfrancis,mboersma]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jackfrancis jackfrancis merged commit cc2e826 into Azure:master Nov 20, 2020
@jackfrancis jackfrancis deleted the windows-addpool branch November 20, 2020 18:14
@jackfrancis jackfrancis mentioned this pull request Nov 30, 2020
8 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Addpool overwriting existing pool instead of adding new pool
5 participants