-
Notifications
You must be signed in to change notification settings - Fork 983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added finalizers addition and removal logic for nodes #466
Conversation
persisted := n.DeepCopy() | ||
node.RemoveFinalizer(n, i) | ||
if err := t.kubeClient.Patch(ctx, n, client.MergeFrom(persisted)); err != nil { | ||
return fmt.Errorf("removing finalizer from node %s, %w", n.Name, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What actually happens in this case (later on)? Does it all get re-driven somehow? (Especially since the cloud instance probably is gone now, so (possibly?) some of the earlier calls will start failing (although, see other comments).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This happens right after the instance is terminated. If this patch fails, it will have ~1 hour to succeed to remove the finalizer, since that's when EC2 will then return a failure here. Does this answer your question?
@@ -65,16 +66,27 @@ func NewController(kubeClient client.Client, coreV1Client corev1.CoreV1Interface | |||
|
|||
// Reconcile executes a reallocation control loop for the resource | |||
func (c *Controller) Reconcile(ctx context.Context, object client.Object) (reconcile.Result, error) { | |||
n := object.(*v1.Node) | |||
if n.DeletionTimestamp == nil || !node.IsKarpenterProvisioned(n) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
node.HasTerminationFinalizer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. We want to continue this reconcile loop until the finalizer is removed. Replacing the line with
if !functional.ContainsString(n.Finalizers, provisioning.KarpenterFinalizer) {
@@ -109,7 +125,7 @@ func ExpectCleanedUp(c client.Client) { | |||
nodes := v1.NodeList{} | |||
Expect(c.List(ctx, &nodes)).To(Succeed()) | |||
for _, node := range nodes.Items { | |||
ExpectDeleted(c, &node) | |||
ExpectDeletedNode(c, &node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExpectNodeDeleted sounds better to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you can't use ExpectDeleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our nodes now have the finalizers added with this code, so ExpectDeleted will only call delete, but not patch the finalizer out.
I was toying with ExpectNodeDeleted
or ExpectDeletedNode
and decided that the former sounds like we're not doing any deletion action, but just checking for existence. The latter sounds more like we're doing an action then checking to me.
} | ||
zap.S().Debugf("Deleted node %s", n.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on making this info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we change the language from Terminated to Deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have Terminated instance
above, but Deleted node
right here. These two log lines assume that the user knows the difference between an instance and node, and are useful sometimes where the APIServer is failing.
In most cases where we terminate the instance, we should be able to remove the finalizer in quick succession. In that sense, I don't want to print two log lines back to back for the same node, but rather an additional Debug
line for those who have the verbosity up.
} | ||
|
||
// 2. Drain terminable nodes | ||
drained, err := c.terminator.drainNode(ctx, n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we discussed this, did we decide we were going to hold off on draining until the next reconcile loop, otherwise new pods could schedule during draining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our discussion was with the assumption that we would patch the spec at the higher level GenericController
, which we recently made the realization that we only want to patch the Status here.
Since we are patching the node here, not the status, we want to cordon and patch this node before continuing on to drain.
pkg/controllers/provisioning/v1alpha1/reallocation/utilization.go
Outdated
Show resolved
Hide resolved
pkg/controllers/provisioning/v1alpha1/reallocation/utilization.go
Outdated
Show resolved
Hide resolved
pkg/utils/functional/functional.go
Outdated
@@ -31,6 +31,16 @@ func UnionStringMaps(maps ...map[string]string) map[string]string { | |||
return result | |||
} | |||
|
|||
func StringsWithout(vals []string, remove string) []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call this StringSliceWithout to match the next fxn name in the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it's a little surprising this doesn't return a (modified) copy of the string slice rather than the original.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it so that it creates a new slice, so that it won't mess with the original slice.
@@ -30,7 +30,7 @@ var ( | |||
OperatingSystemLabelKey, | |||
ProvisionerNameLabelKey, | |||
ProvisionerNamespaceLabelKey, | |||
ProvisionerPhaseLabel, | |||
ProvisionerUnderutilizedKey, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: LabelKey
} | ||
zap.S().Debugf("Deleted node %s", node.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this changed to debug?
Issue #, if available:
Description of changes:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.