-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip iteration loop if node creation failed #4143
Conversation
@@ -346,7 +346,10 @@ func (a *StaticAutoscaler) RunOnce(currentTime time.Time) errors.AutoscalerError | |||
return nil | |||
} | |||
|
|||
a.deleteCreatedNodesWithErrors() | |||
if a.deleteCreatedNodesWithErrors() { | |||
klog.V(0).Infof("Creation of some nodes failed, deleting them and skipping iteration") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think the wording is a bit misleading - it suggests CA tried to create nodes this loop and failed. Maybe "Removed some nodes that failed to create, skipping iteration"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, changed to "Some nodes that failed to create were removed, skipping iteration" for consistency with the similar log above.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: BigDarkClown, MaciekPytel The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/unhold |
/lgtm |
Skip iteration loop if node creation failed
Skip iteration loop if node creation failed
Skip iteration loop if node creation failed
Skip iteration loop if node creation failed
In case of stockout or quota issue CA will watch for nodes with create errors and delete them. After deletion it will continue the loop normally. However, it will not trigger any sort of clusterstate recalculation and the deleted nodes will still be counted as upcoming and the rest of CA logic will continue expecting them to come up.
Similar logic that handles nodes that timed out will break CA loop instead. This seems like a more correct behaviour.