Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolved two bugs with NodeAffinity and simplified compatibility logic #1549

Merged
merged 2 commits into from
Mar 22, 2022

Conversation

ellistarn
Copy link
Contributor

@ellistarn ellistarn commented Mar 19, 2022

Description of changes

Fixed: A should be compatible to B, <DoesNotExist, NotIn> operator

According to the second bullet at https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#set-based-requirement, DoesNotExist and NotIn are compatible. For example, A Pod that requires foo 'not in' bar may schedule to a node without a label with key foo.

Previously, we required the node to have the label foo with a value of something other than bar.

Fixed: A should be compatible to B, <Exists, Empty> operator

Node labels cannot repel a pod (see: taints), unless a pod has conflicting requirements. In this case, a pod with empty requirements should happily schedule to a node with an Exists requirement on a node.

Previously, the pod needed to have a requirement for label foo in order to schedule onto node with requirement foo: Exists.

Note: provisioner only supports [In NotIn Exists], so it wasn't currently possible to run into this bug.

Cleanup: NullSet == EmptySet

Previously, we treated the undefined set to be the same as the universal set. In set theory, the nil set and the empty set are typically modeled equivalently (https://en.wikipedia.org/wiki/Empty_set). Treating NullSet == UniversalSet set was relied upon in requirements.Add, but significantly increased the complexity of our compatibility checks. Instead, we intersect requirements in requirements.Add if a requirement for the given key already exists, otherwise we use the added requirement as-is.

Cleanup: Compatibility Checks

Previously we oriented the logic around NodeSelectorOp rules via requirements.hasRequirement(withKeyAndOperator(key, v1.NodeSelectorOpExists)). Instead, we can use the properties of the underlying set (e.g. Full, Empty, Positive, Negative). This collapses O(n^2) iteration through requirements to O(n), and simplifies the predicates.

Cleanup: Requirement Validation

Previously, requirement validation logic redundantly checked that DoesNotExist operators did not coexist with other operators as well as ensuring that valid values exist for any requirement with an operator other than DoesNotExist. This has been collapsed into a single check, and no longer relies on requirements.hasRequirement(withKeyAndOperator. This also collapses O(n^2) iteration through requirements to O(n), and simplifies the predicates.

3. How was this change tested?

  • make test
  • Manually
kind: Provisioner
metadata:
  name: default
spec:
  requirements:
  - key: isolation
    operator: Exists
apiVersion: v1
kind: Pod
metadata:
  name: inflate-6d7d985f4d-8pkgx
  namespace: default
spec:
  containers:
  - image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
    name: inflate
    resources:
      requests:
        cpu: "1"
  nodeSelector:
    isolation: ellis
apiVersion: v1
kind: Node
metadata:
  annotations:
    node.alpha.kubernetes.io/ttl: "0"
  creationTimestamp: "2022-03-19T19:47:34Z"
  finalizers:
  - karpenter.sh/termination
  labels:
    isolation: ellis <------------------------------ Here it is!
    karpenter.sh/capacity-type: on-demand
    karpenter.sh/provisioner-name: default
    node.kubernetes.io/instance-type: t3a.micro
    topology.kubernetes.io/zone: us-west-2b
  name: ip-192-168-120-29.us-west-2.compute.internal

4. Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: link to issue
  • No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@netlify
Copy link

netlify bot commented Mar 19, 2022

✅ Deploy Preview for karpenter-docs-prod canceled.

🔨 Explore the source changes: dff9703

🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/62392661df9ab20009933e06

@ellistarn ellistarn force-pushed the requirements branch 2 times, most recently from 4cb053d to e4f5519 Compare March 19, 2022 19:10
}
if existing, ok := r.requirements[requirement.Key]; ok {
Copy link
Contributor Author

@ellistarn ellistarn Mar 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was necessary due to the change on line 134. Now that we're considering the EmptySet and NullSet as equivalent, we should only intersect values if they're already defined.

func (r Requirements) Get(key string) sets.Set {
if _, ok := r.requirements[key]; !ok {
Copy link
Contributor Author

@ellistarn ellistarn Mar 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit of set theory is inaccurate and was causing problems. The empty set and undefined set are both equivalent to {}.

A := NewRequirements(v1.NodeSelectorRequirement{Key: v1.LabelTopologyZone, Operator: v1.NodeSelectorOpExists})
B := NewRequirements()
Expect(A.Compatible(B)).ToNot(Succeed())
Expect(A.Compatible(B)).To(Succeed())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was incorrect. Pods are not required to have a nodeSelector in order to schedule onto a node.

A := NewRequirements(v1.NodeSelectorRequirement{Key: v1.LabelTopologyZone, Operator: v1.NodeSelectorOpDoesNotExist})
B := NewRequirements(v1.NodeSelectorRequirement{Key: v1.LabelTopologyZone, Operator: v1.NodeSelectorOpNotIn, Values: []string{"foo"}})
Expect(A.Compatible(B)).ToNot(Succeed())
Expect(A.Compatible(B)).To(Succeed())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the second bullet at https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#set-based-requirement, DoesNotExist and NotIn are compatible. A Pod that says foo not in bar may schedule to a node without any label with key foo.

@ellistarn ellistarn changed the title Resolved an issue with NotExists requirements and simplified intersection logic Resolved two bugs with NodeAffinity and simplified intersection logic Mar 19, 2022
tzneal
tzneal previously approved these changes Mar 20, 2022
r.requirements[requirement.Key] = sets.NewSet()
values = sets.NewSet()
default:
values = sets.NewSet()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This default case should never happen given what we currently allow, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I'll rip this out. I only needed it when this was a helper function.

if s.Len() > 0 {
return v1.NodeSelectorOpIn
}
return v1.NodeSelectorOpDoesNotExist
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat!

tzneal
tzneal previously approved these changes Mar 20, 2022
@ellistarn ellistarn changed the title Resolved two bugs with NodeAffinity and simplified intersection logic Resolved two bugs with NodeAffinity and simplified compatibility logic Mar 20, 2022
@ellistarn ellistarn force-pushed the requirements branch 3 times, most recently from 5262819 to 224dfd1 Compare March 21, 2022 22:33
@@ -76,7 +76,7 @@ func (c *Constraints) GenerateLabels() map[string]string {
case v1.NodeSelectorOpIn:
labels[key] = c.Requirements.Get(key).Values().UnsortedList()[0]
case v1.NodeSelectorOpExists, v1.NodeSelectorOpNotIn:
labels[key] = rand.String(10)
labels[key] = rand.String(10) // At size of the universe odds, NotIn could conflict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we resolve the conflict? We use to do it with for(r.Get(key).has(label)){}.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or better yet, why bother assigning a node label for NotIn? We should just ignore it and k8s will have no problem assigning the pod to the node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow even better 🤯

@felix-zhe-huang
Copy link
Contributor

felix-zhe-huang commented Mar 22, 2022

I love this new requirement logics! So clean. I would like to walk through the unit tests again later according to these new logics to make sure that nothing is missed.

Copy link
Contributor

@felix-zhe-huang felix-zhe-huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it! Ship it!

@felix-zhe-huang felix-zhe-huang merged commit d526a49 into aws:main Mar 22, 2022
@ellistarn ellistarn deleted the requirements branch March 22, 2022 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants