Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support new topologySpread scheduling constraints #852

Merged

Conversation

jmdeal
Copy link
Member

@jmdeal jmdeal commented Dec 8, 2023

Fixes #430

Description
This PR adds support for the following topology spread constraint fields:

  • matchLabelKeys
  • nodeAffinityPolicy
  • nodeTaintsPolicy

How was this change tested?
make test

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 8, 2023
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Dec 8, 2023
@jmdeal jmdeal changed the title feat: support new topologySpread scheduling constraints [WIP] feat: support new topologySpread scheduling constraints Dec 8, 2023
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 8, 2023
@jmdeal jmdeal force-pushed the support-affinity-and-taint-policy branch from aed9f49 to 54af319 Compare December 8, 2023 19:34
@coveralls
Copy link

coveralls commented Dec 8, 2023

Pull Request Test Coverage Report for Build 13251614430

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 210 of 213 (98.59%) changed or added relevant lines in 10 files are covered.
  • 4 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.1%) to 81.467%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/controllers/provisioning/scheduling/topology.go 106 109 97.25%
Files with Coverage Reduction New Missed Lines %
pkg/controllers/provisioning/scheduling/topologygroup.go 2 97.85%
pkg/controllers/provisioning/scheduling/topologynodefilter.go 2 95.45%
Totals Coverage Status
Change from base Build 13211152753: 0.1%
Covered Lines: 9249
Relevant Lines: 11353

💛 - Coveralls

@jmdeal jmdeal force-pushed the support-affinity-and-taint-policy branch 7 times, most recently from 4329131 to f0127f2 Compare December 9, 2023 01:46
@jmdeal jmdeal force-pushed the support-affinity-and-taint-policy branch 3 times, most recently from b23a605 to 0408162 Compare December 11, 2023 21:59
@jmdeal jmdeal changed the title [WIP] feat: support new topologySpread scheduling constraints feat: support new topologySpread scheduling constraints Dec 11, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 11, 2023
@jmdeal jmdeal force-pushed the support-affinity-and-taint-policy branch from 0408162 to 6673b5b Compare December 11, 2023 23:14
@jmdeal jmdeal changed the title feat: support new topologySpread scheduling constraints [WIP] feat: support new topologySpread scheduling constraints Dec 13, 2023
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 13, 2023
@jmdeal
Copy link
Member Author

jmdeal commented Dec 13, 2023

Holding to include matchLabelKeys for pod affinity with k8s v1.29.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 14, 2023
@jonathan-innis
Copy link
Member

This is awesome work! This is definitely getting close! I think mostly a few small things -- we should write-down things that we want to refactor here since there were some ideas thrown out about how we could move existing nodes into topologyGroups and avoid iterating through stateNodes in countDomains -- also some things around naming of functions -- also also capturing nodeAffinities in eligible domains for topologyDomainGroups

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 10, 2025
@jmdeal jmdeal force-pushed the support-affinity-and-taint-policy branch from da3c39a to 3feab07 Compare February 10, 2025 22:21
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 10, 2025
@jmdeal jmdeal force-pushed the support-affinity-and-taint-policy branch from 3feab07 to 3441351 Compare February 10, 2025 22:43
@jmdeal jmdeal force-pushed the support-affinity-and-taint-policy branch from dd3ce78 to 58005bf Compare February 10, 2025 23:55
Copy link
Member

@jonathan-innis jonathan-innis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

"sigs.k8s.io/karpenter/pkg/scheduling"
)

// TopologyDomainGroup tracks the domains for a single topology. Additionally, it tracks the taints associated with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think my confusion here was just based around when this was used -- this is only used when constructing the topologyGroup -- it's not actually used when we are doing any counting since we should have already discovered all of the available domains at the top of the loop

}

return filter
}

// Matches returns true if the TopologyNodeFilter doesn't prohibit node from the participating in the topology
func (t TopologyNodeFilter) Matches(node *v1.Node) bool {
return t.MatchesRequirements(scheduling.NewLabelRequirements(node.Labels))
func (t TopologyNodeFilter) Matches(taints []corev1.Taint, requirements scheduling.Requirements, compatibilityOptions ...option.Function[scheduling.CompatibilityOptions]) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm if these compatibility options are still needed? Because the pod requirements are being checked against the node requirements, I believe there is a hard requirement that all of the nodeRequirements must exist when evaluating Counts (undefined doesn't mean it should count since this is really only true for NodeClaims that are still in-flight and we've already discussed that it's already too complicated to try and figure out if this is right or not). Anyways, the current behavior is that an undefined label doesn't mean the pod is compatible with it and I don't think that we should drop or change that behavior

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we only allow missing WellKnownLabels when evaluating if an inflight NodeClaim should count against a domain. I don't think we should drop that here because that would result in us not counting a nodeclaim for a topology group when we should have. I think this is the right side for us to error on - being to conservative and overcounting NodeClaims rather than undercounting them.

@jmdeal jmdeal force-pushed the support-affinity-and-taint-policy branch from e42e1c8 to 19836eb Compare February 11, 2025 10:10
Copy link
Member

@jonathan-innis jonathan-innis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 11, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jmdeal, jonathan-innis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 11, 2025
@jonathan-innis
Copy link
Member

/hold

Wait for confirmation from @jmdeal that this one is good to merge

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 11, 2025
@jmdeal
Copy link
Member Author

jmdeal commented Feb 12, 2025

Should be good to go 👍

@jmdeal
Copy link
Member Author

jmdeal commented Feb 12, 2025

/remove-hold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 12, 2025
@k8s-ci-robot k8s-ci-robot merged commit 45f73ec into kubernetes-sigs:main Feb 12, 2025
13 checks passed
@codeeong
Copy link

When will a new release with this change be rolled out? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. blocked Unable to make progress due to some dependency cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support new topologySpread scheduling constraints