-
Notifications
You must be signed in to change notification settings - Fork 979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add support for node readiness taints #631
Conversation
✔️ Deploy Preview for karpenter-docs-prod canceled. 🔨 Explore the source changes: c00b9bb 🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/6124c5c16a81460007526d5f |
pkg/controllers/allocation/bind.go
Outdated
@@ -59,10 +65,41 @@ func (b *Binder) Bind(ctx context.Context, node *v1.Node, pods []*v1.Pod) error | |||
if !errors.IsAlreadyExists(err) { | |||
return fmt.Errorf("creating node %s, %w", node.Name, err) | |||
} | |||
// If the node object already exists, make sure finalizer and taint are in place. | |||
if err := b.KubeClient.Patch(ctx, node, client.StrategicMergeFrom(stored)); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a controller (node/finalizer.go) that enforces this already.
Further, if we move forward with this change, we should probably make sure that v1.Taint{ Key: v1alpha3.NotReadyTaintKey, Effect: v1.TaintEffectNoSchedule, }
is applied as part of the user data like all of the other taints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I removed the Patch
call and v1alpha3.NotReadyTaintKey
is now appended to the ReadinessTaints
to make sure it gets added in the user data (launch template).
pkg/controllers/allocation/bind.go
Outdated
errs := make([]error, len(pods)) | ||
// 4. Wait for node readiness. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't wait here. This will block the entire provisioning process on nodes coming online, which will massively reduce provisioning throughput. If we move forward with this path, we will need some async mechanism for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a commit which uses asynchronous bind, but only if it is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll admit I'm not a huge fan of this approach, but I don't immediately have a better alternative.
- We need to be careful with any sort of delayed/async binding behavior, as it may cause kube-scheduler race conditions.
- I don't like adding a new API concept for folks to think about.
I want to figure out a way for this to "just work", but it may take some more time to think through. Perhaps we can discuss this at the Karpenter Working group and go through the use cases?
Closing as stale. Happy to reopen this discussion in the future. |
any more thoughts/motivation on this approach? i'm stuck on the original problem while trying to implement Cilium in my cluster via taints. |
Which version of Cilium? I was using Cilium and Karpenter in my clusters as recently as a few weeks ago. |
1.11.3 -- without using the node taint method, I end up with pods which do not have matching |
Makes sense, thanks for clarifying! |
Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Implement #628 adding support for node readiness taints.
Issue, if available:
Description of changes:
This PR adds support for node readiness taints as described in #628.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.