-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClusterAPI Provider: Provide fake proivder IDs for failed Machines #2983
ClusterAPI Provider: Provide fake proivder IDs for failed Machines #2983
Conversation
0fb8c6c
to
70ac43c
Compare
1f9b472
to
eaeabe2
Compare
@@ -422,6 +443,15 @@ func (c *machineController) machineSetProviderIDs(machineSet *MachineSet) ([]str | |||
continue | |||
} | |||
|
|||
if machine.Status.Phase == "Failed" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be compare against ErrorMessage instead?
/area provider/cluster-api |
82ba453
to
b0b20a1
Compare
nit: do you think we can make conflate |
@@ -422,6 +443,15 @@ func (c *machineController) machineSetProviderIDs(machineSet *MachineSet) ([]str | |||
continue | |||
} | |||
|
|||
if machine.Status.FailureMessage != nil { | |||
klog.V(4).Infof("Status.FailureMessage of machine %q is %q", machine.Name, *machine.Status.FailureMessage) | |||
// Provide a fake ID that can be recognised later and converted into a machine key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: may be add the "why" i.e so the machine is tracked by the nodegroup and the maxNodeProvisionTime can kicks in
b0b20a1
to
d23d3a1
Compare
@enxebre I addressed your feedback, PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: enxebre The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[CA-1.18] #2983 cherry-pick: ClusterAPI Provider: Provide fake proivder IDs for failed Machines
This is based on the spot instance handling from the AWS provider (#2235).
Some ClusterAPI implementations will mark a Machine as failed if there is a problem creating the instance on the cloud provider. In this case, these instances weren't being considered as bad and the autoscaler considered them to be
comingUp
. With this patch in, after themaxNodeProvisionTime
is reached, the autoscaler marks the nodegroup as unhealthy, scales it down (removing the unhealthy machine), and then tries to scale up an alternate group, matching the behaviour seen with the AWS provider once that was patched