-
Notifications
You must be signed in to change notification settings - Fork 519
chore: limit number of upgrade retries if new CP nodes bootstrap fails #4068
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4068 +/- ##
==========================================
+ Coverage 73.18% 73.23% +0.04%
==========================================
Files 135 135
Lines 20579 20625 +46
==========================================
+ Hits 15061 15104 +43
- Misses 4545 4547 +2
- Partials 973 974 +1
Continue to review full report at Codecov.
|
@@ -21,6 +22,8 @@ type Client interface { | |||
ListAllPods() (*v1.PodList, error) | |||
// ListNodes returns a list of Nodes registered in the api server. | |||
ListNodes() (*v1.NodeList, error) | |||
// ListNodesByOptions returns a list of Nodes registered in the api server. | |||
ListNodesByOptions(opts metav1.ListOptions) (*v1.NodeList, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing this publicly exportable interface definition is dangerous FYI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding methods should be fine, right?
return uc.Translator.Errorf("Error while querying ARM for resources: %+v", err) | ||
} | ||
|
||
if kubeClient != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this nil check? We should be confident that a non-err response from L100 above will guarantee that k
is not nil, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're not confident then I'd just add that nil check to the err condition:
if err != nil || k == nil {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the only reason is to pass test Should not fail if a Kubernetes client cannot be created
, do not know the history so I am just complying
@@ -1071,3 +1076,237 @@ var _ = Describe("Upgrade Kubernetes cluster tests", func() { | |||
Expect(len(newNode.Spec.Taints)).To(Equal(2)) | |||
}) | |||
}) | |||
|
|||
func TestCheckControlPlaneNodesStatus(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
} | ||
// return error if more than 1 upgraded node is not ready | ||
// do not bother if upgradedNotReadyCount == mastersCount, at that point upgrade cannot take down more nodes | ||
if upgradedNotReadyCount > 1 && upgradedNotReadyCount < mastersCount { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the extra &&
condition here? Shouldn't we always return error if there are more than one not ready control plane VM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, I thought I had a good reason but on a second thought it is actually not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jackfrancis, jadarsie The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Reason for Change:
Most of the failures that cause
aks-engine upgrade
to fail are retryable. However bad or incompatibleproperties.orchestratorProfile.kubernetesConfig
property values in your API model will preventkubelet
(or control plane components) from starting successfully. In this case,aks-engine upgrade
assumes that the node was upgraded successfully because it only checks for theorchestrator
tag.This PR limits the number of upgrade operations if two upgraded CP nodes are not ready.
Notes:
tools-install
installsgomock
Credit Where Due:
Does this change contain code from or inspired by another project?
Requirements: