Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topology now factors in affinity and global requirements (subnets) when computing spread #772

Merged
merged 4 commits into from
Oct 29, 2021

Conversation

ellistarn
Copy link
Contributor

@ellistarn ellistarn commented Oct 25, 2021

1. Issue, if available:

2. Description of changes:

  1. Topology and Affinity no longer assume a subnet exists for all zones. GetInstanceTypes now returns the set of instance type offerings limited by subnet AZ availability rather than assuming. These offerings are then used to compute WellKnownLabel values (including zones). This is roughly the same behavior, except it happens once per provisioning loop (+caching) rather than once at process startup.
  2. Support for https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/#interaction-with-node-affinity-and-node-selectors
  3. WellKnownLabel values are not longer validated at provisioner creation time. This is because valid zones can change over time and based on subnet creation ordering w.r.t provision creation. Rather than trying to be perfect at creation time, we enforce the correctness at runtime.
  4. Cleaned up some scheduling code since I was already making significant changes
  5. Cleaned up the allocator code since I was already making changes there
  6. Cleaned up some metrics code

3. Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: link to issue
  • No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@netlify
Copy link

netlify bot commented Oct 25, 2021

✔️ Deploy Preview for karpenter-docs-prod canceled.

🔨 Explore the source changes: ac5daa3

🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/617c2710a881010007d19839

@ellistarn ellistarn force-pushed the scheduling branch 16 times, most recently from f9779b4 to 70a771e Compare October 28, 2021 05:04
@ellistarn ellistarn changed the title [WIP] Introducing cloudprovider API to vend provisioner specific Requirements Topology now factors in subnets when computing spread Oct 28, 2021
Copy link
Contributor

@bwagner5 bwagner5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick pass through. Nice refactor and bug fix :)


func (b *Binder) bind(ctx context.Context, node *v1.Node, pods []*v1.Pod) error {
// 1. Add the Karpenter finalizer to the node to enable the termination workflow
// Add the Karpenter finalizer to the node to enable the termination workflow
node.Finalizers = append(node.Finalizers, v1alpha5.TerminationFinalizer)
// 2. Taint karpenter.sh/not-ready=NoSchedule to prevent the kube scheduler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove 2.

@ellistarn ellistarn changed the title Topology now factors in subnets when computing spread Topology now factors in affinity and global requirements (subnets) when computing spread Oct 28, 2021
@JacobGabrielson JacobGabrielson self-requested a review October 28, 2021 19:13
pkg/apis/provisioning/v1alpha5/provisioner.go Outdated Show resolved Hide resolved
pkg/apis/provisioning/v1alpha5/provisioner.go Outdated Show resolved Hide resolved
pkg/apis/provisioning/v1alpha5/provisioner.go Outdated Show resolved Hide resolved
pkg/cloudprovider/aws/instance.go Outdated Show resolved Hide resolved
pkg/cloudprovider/aws/instancetypes.go Outdated Show resolved Hide resolved
pkg/cloudprovider/aws/subnets.go Show resolved Hide resolved
pkg/cloudprovider/aws/subnets.go Outdated Show resolved Hide resolved
pkg/cloudprovider/fake/cloudprovider.go Outdated Show resolved Hide resolved
@ellistarn ellistarn force-pushed the scheduling branch 4 times, most recently from 37e28c9 to 23b6342 Compare October 28, 2021 20:58
@JacobGabrielson JacobGabrielson self-requested a review October 28, 2021 23:07
pkg/controllers/allocation/scheduling/taints.go Outdated Show resolved Hide resolved
@ellistarn ellistarn merged commit d7fd106 into aws:main Oct 29, 2021
@ellistarn ellistarn deleted the scheduling branch October 29, 2021 19:23
gfcroft pushed a commit to gfcroft/karpenter-provider-aws that referenced this pull request Nov 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants