Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add support for node readiness taints #631

Closed
wants to merge 2 commits into from
Closed

[WIP] Add support for node readiness taints #631

wants to merge 2 commits into from

Conversation

alex-berger
Copy link
Contributor

Implement #628 adding support for node readiness taints.

Issue, if available:

Description of changes:

This PR adds support for node readiness taints as described in #628.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@netlify
Copy link

netlify bot commented Aug 23, 2021

✔️ Deploy Preview for karpenter-docs-prod canceled.

🔨 Explore the source changes: c00b9bb

🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/6124c5c16a81460007526d5f

@rustrial rustrial mentioned this pull request Aug 23, 2021
@@ -59,10 +65,41 @@ func (b *Binder) Bind(ctx context.Context, node *v1.Node, pods []*v1.Pod) error
if !errors.IsAlreadyExists(err) {
return fmt.Errorf("creating node %s, %w", node.Name, err)
}
// If the node object already exists, make sure finalizer and taint are in place.
if err := b.KubeClient.Patch(ctx, node, client.StrategicMergeFrom(stored)); err != nil {
Copy link
Contributor

@ellistarn ellistarn Aug 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a controller (node/finalizer.go) that enforces this already.

Further, if we move forward with this change, we should probably make sure that v1.Taint{ Key: v1alpha3.NotReadyTaintKey, Effect: v1.TaintEffectNoSchedule, } is applied as part of the user data like all of the other taints.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I removed the Patch call and v1alpha3.NotReadyTaintKey is now appended to the ReadinessTaints to make sure it gets added in the user data (launch template).

errs := make([]error, len(pods))
// 4. Wait for node readiness.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't wait here. This will block the entire provisioning process on nodes coming online, which will massively reduce provisioning throughput. If we move forward with this path, we will need some async mechanism for this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit which uses asynchronous bind, but only if it is required.

Copy link
Contributor

@ellistarn ellistarn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll admit I'm not a huge fan of this approach, but I don't immediately have a better alternative.

  1. We need to be careful with any sort of delayed/async binding behavior, as it may cause kube-scheduler race conditions.
  2. I don't like adding a new API concept for folks to think about.

I want to figure out a way for this to "just work", but it may take some more time to think through. Perhaps we can discuss this at the Karpenter Working group and go through the use cases?

@ellistarn
Copy link
Contributor

Closing as stale. Happy to reopen this discussion in the future.

@ellistarn ellistarn closed this Nov 18, 2021
@BryanStenson-okta
Copy link
Contributor

any more thoughts/motivation on this approach? i'm stuck on the original problem while trying to implement Cilium in my cluster via taints.

@ellistarn
Copy link
Contributor

ellistarn commented Apr 12, 2022

Which version of Cilium? I was using Cilium and Karpenter in my clusters as recently as a few weeks ago.

@BryanStenson-okta
Copy link
Contributor

BryanStenson-okta commented Apr 12, 2022

1.11.3 -- without using the node taint method, I end up with pods which do not have matching CiliumEndpoints (which basically means I cannot ensure any of the NetworkPolicy goodness is applied to these nodes).

@ellistarn
Copy link
Contributor

Makes sense, thanks for clarifying!

gfcroft pushed a commit to gfcroft/karpenter-provider-aws that referenced this pull request Nov 25, 2023
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants