Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reoriented WellKnownLabels API parameters around generic requirements #747

Merged
merged 7 commits into from
Oct 22, 2021

Conversation

ellistarn
Copy link
Contributor

@ellistarn ellistarn commented Oct 16, 2021

1. Issue, if available:

2. Description of changes:
Experimental API Change/Proposal.

Behavior

  • pod.NodeSelector, pod.Affinity.NodeAffinity, provisioner.Labels, and Provisioner.Requirements all communicate using the same well known label semantic and their requirements are intersected.

Motivation

  • Users could not specify a NotIn semantic at the provisioner level (e.g. excluding specific instance types)
  • Users can use now upstream well known labels at the provisioner level, avoiding multiple concepts for the same thing (e.g. instanceType == node.kubernetes.io/instance-type, it's always the latter now)
  • Any additional well known labels no longer require API Changes, a new CRD installation, etc. Just a new controller binary.
  • Neutral and vendor specific labels were not modeled in the same way, which wasn't intuitive, since they behave the same.

Future

  • This will require a v1alpha5 bump (in a follow-on PR, before v0.4.1 release)
  • It's backwards compatible to implement a sibling Preferences field, to configure soft constraints at the provisioner level. I punted on this complexity for this PR.

Example Provisioner

spec:
  labels: # sugar for requirements w/ hardcoded value
    custom: label
    topology.kubernetes.io/zone: us-west-2a 
  requirements: # flexible In/NotIn syntax for well known labels
  - key: node.k8s.aws/capacity-type
    operator: In
    values: ["spot"]
  - key: node.kubernetes.io/instance-type
    operator: NotIn
    values: ["m5.large", "m5.2xlarge"]
  - key: custom.billing.label
    operator: In
    values: ["myteam"]
  provider:
    cluster:
      endpoint: https://BE5C79AC824BB1EF67DB8CC14D6A263F.gr7.us-west-2.eks.amazonaws.com
      name: etarn

3. Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: link to issue
  • No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@netlify
Copy link

netlify bot commented Oct 16, 2021

✔️ Deploy Preview for karpenter-docs-prod canceled.

🔨 Explore the source changes: af2032a

🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/6171f2dc1eee370008614747

@ellistarn ellistarn changed the title Reoriented WellKnownLabels API parameters around generic requirements [EXPERIMENTAL][WIP] Reoriented WellKnownLabels API parameters around generic requirements Oct 16, 2021
@bwagner5
Copy link
Contributor

Can you post a sample Provisioner using this?

@ellistarn ellistarn force-pushed the requirements branch 3 times, most recently from 3b1311e to f927ad9 Compare October 20, 2021 02:23
@ellistarn ellistarn changed the title [EXPERIMENTAL][WIP] Reoriented WellKnownLabels API parameters around generic requirements Reoriented WellKnownLabels API parameters around generic requirements Oct 20, 2021
Copy link
Contributor

@JacobGabrielson JacobGabrielson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm overall, just a few minor comments

return c.Requirements.GetLabelValues(v1.LabelTopologyZone)
}

/// InstanceTypes for the constraints
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra /

@@ -20,7 +20,7 @@ import (

"github.com/awslabs/karpenter/pkg/apis/provisioning/v1alpha4"
"github.com/awslabs/karpenter/pkg/cloudprovider"
"k8s.io/api/core/v1"
v1 "k8s.io/api/core/v1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An epic battle with my linter -- which I lost. Will fix.

if err := constraints.Constrain(ctx); err != nil {
return nil, fmt.Errorf("applying constraints, %w", err)
}
constraints.Requirements = constraints.CombineRequirements()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a bit odd for the constraints struct not to handle this itself (as opposed to the Scheduler reaching in and doing it?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit tricky since we're reusing the same struct throughout the codebase (Constraints), which is optimized for user expression. I wanted to avoid CombineRequirements() in a tight for-each-pod loop, as it may execute ~10,000s of times. Perhaps this optimization is unnecessary. It definitely does feel like a hack and I really dislike the mutation.

Here are the options I see:

  1. Mutate the existing struct (current approach)
  2. Create a new concept that wraps constraints for Scheduling

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth optimizing, for sure. I think if it worked something like:

requirements := contraints.CombineRequirements() // makes copy and compresses
// ... 
        if err := s.Topology.Inject(ctx, requirements, pods); err != nil {
		return nil, fmt.Errorf("injecting topology, %w", err)
	}

That could work. If you want to do option #2, something like (ignore bad naming):

// Call this before using constraints in a tight loop
constraints.Compress()

Which would do substantially the same thing you're doing today, but would be hidden by the constraints thingy (object?) itself, rather than the client code knowing how to do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the first example, we would need to pass these combined requirements (along w/ provider, taints, and labels) into the per-pod scheduling logic. By the time we're passing those 4 things, we might as well use the constraints object, since it has those exact fields.

I like the second approach you noted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up doing a hybrid approach:

constraints = constraints.Compress()

This avoids mutation and returns a compressed copy, which we use for the rest of provisioning.

labels:
additionalProperties:
type: string
description: Labels will be applied to every node launched by the
Provisioner.
description: Labels are layered with Requirements and applied to every
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, Labels is a little confusing. Should this actually just be NodeSelectors to denote that it has some sort of requirements logic in it. I think propagating out labels for a nodeSelector spec makes intuitive sense too, wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline

pkg/apis/provisioning/v1alpha4/provisioner.go Outdated Show resolved Hide resolved
pkg/apis/provisioning/v1alpha4/provisioner.go Outdated Show resolved Hide resolved
pkg/cloudprovider/aws/securitygroups.go Outdated Show resolved Hide resolved
pkg/controllers/allocation/scheduling/scheduler.go Outdated Show resolved Hide resolved
bwagner5
bwagner5 previously approved these changes Oct 21, 2021
Copy link
Contributor

@bwagner5 bwagner5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@ellistarn ellistarn merged commit 67d637a into aws:main Oct 22, 2021
@ellistarn ellistarn deleted the requirements branch October 22, 2021 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants