Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add affinity and anti-affinity support #1626

Merged
merged 43 commits into from
Apr 13, 2022
Merged

Conversation

tzneal
Copy link
Contributor

@tzneal tzneal commented Apr 5, 2022

1. Issue, if available:

Fixes #942 and #985

2. Description of changes:

Adds support for pod affinity and anti-affinity

3. How was this change tested?

Unit testing and on my local EKS cluster.

4. Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: link to issue
  • No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@tzneal tzneal requested a review from a team as a code owner April 5, 2022 18:10
@tzneal tzneal changed the title Add affinity support Add affinity and anti-affinity support Apr 5, 2022
@netlify
Copy link

netlify bot commented Apr 5, 2022

Deploy Preview for karpenter-docs-prod canceled.

Name Link
🔨 Latest commit 08ac9ec
🔍 Latest deploy log https://app.netlify.com/sites/karpenter-docs-prod/deploys/6255f02a3225d40008805590

@tzneal tzneal force-pushed the add-affinity-support branch 2 times, most recently from f102009 to 1290d3e Compare April 6, 2022 02:01
@tzneal tzneal force-pushed the add-affinity-support branch 3 times, most recently from 2a9ce99 to ce87b8c Compare April 7, 2022 02:27
@tzneal tzneal force-pushed the add-affinity-support branch from fa81d41 to ccba867 Compare April 7, 2022 19:21
@tzneal tzneal force-pushed the add-affinity-support branch 2 times, most recently from dcf64d1 to 3be1b37 Compare April 8, 2022 01:47
// included for topology counting purposes. This is only used with topology spread constraints as affinities/anti-affinities
// always count across all nodes. A nil or zero-value TopologyNodeFilter behaves well and the filter returns true for
// all nodes.
type TopologyNodeFilter []v1alpha5.Requirements
Copy link
Contributor

@ellistarn ellistarn Apr 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that this type has been simplified, it occurs to me that it might be more straightforward to fold this into topologygroup e.g.

type struct TopologyGroup {
  ...
  nodeFilter []v1alpha5.Requirements
}

. It should be ~40 lines of code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's less clear if it's merged in. Now, it has this discrete functionality of being a filter. That fact that it's internally represented as a type definition for a list of requirements is an implementation detail.

If it's merged into topology group, it's no longer a filter and instead is just a list of requirements that may or may not get matched against a node or other set of requirements depending on the topology type. The methods can't hang off the list of requirements, unless they remain a type in which case it's just concatenating two source files. The reader can't treat that as a black box "filter" concept as easily.

Comment on lines +79 to +84
p.removeRequiredNodeAffinityTerm,
p.removePreferredPodAffinityTerm,
p.removePreferredPodAntiAffinityTerm,
p.removePreferredNodeAffinityTerm,
p.removeTopologySpreadScheduleAnyway,
p.toleratePreferNoScheduleTaints,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the order in which these are done change the end result? Is this the suggested order by Kubernetes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can definitely change the result. There isn't an order specified by K8s, they perform scheduling in a different way from what I can tell (find all compatible nodes, sort by score). We schedule on the first compatible node treating all preferences as required, and removing one preference at a time until it either schedules or there are no more to remove and it fails.

return func(i, j int) bool {
return instanceTypes[i].Price() < instanceTypes[j].Price()
func (s *Scheduler) scheduleExisting(pod *v1.Pod, nodes []*Node) *Node {
// Try nodes in ascending order of number of pods to more evenly distribute nodes, 100ms at 2000 nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any other dimensions that could affect this speed? Could imagine that as constraints tighten and requirements increase, this will naturally take longer to solve the constraints. Additionally, does this number depend on the hardware it runs on? A number like this could be misleading if so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may vary depending on CPU, but that should be it. We're just sorting s list of pointers by the pod count which is a constant operation to retrieve.

if relaxed {
// The pod has changed, so topology needs to be recomputed
if err := topology.Update(ctx, pod); err != nil {
logging.FromContext(ctx).With("pod", client.ObjectKeyFromObject(pod)).Errorf("updating topology, %s", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this get surfaced in the logs? I can imagine this would get REALLY noisy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only logs on topology update error, IIRC that error only occurs when a kube-api server call fails which shouldn't occur often.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind. This call is cached. It should only fail due to a code bug.

pods = append(pods, makePodAntiAffinityPods(count/7, v1.LabelHostname)...)
pods = append(pods, makePodAntiAffinityPods(count/7, v1.LabelTopologyZone)...)
// We intentionally don't do anti-affinity by zone as that creates tons of unschedulable pods.
//pods = append(pods, makePodAntiAffinityPods(count/7, v1.LabelTopologyZone)...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just remove this line and keep the comment since we aren't doing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a benchmark test and I left it in for now as I keep adding/removing that line to perform some benchmarking just to get the numbers.

}

// TopologyGroup is a set of pods that share a topology spread constraint
// TopologyGroup is used to track pod counts that match a selector by the topology domain (e.g. SELECT COUNT(*) FROM pods GROUP BY(topology_ke
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the surprise SQL appearance :)

Suggested change
// TopologyGroup is used to track pod counts that match a selector by the topology domain (e.g. SELECT COUNT(*) FROM pods GROUP BY(topology_ke
// TopologyGroup is used to track pod counts that match a selector by the topology domain (e.g. SELECT COUNT(*) FROM pods GROUP BY(topology_key))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂

ellistarn
ellistarn previously approved these changes Apr 8, 2022
Copy link
Contributor

@ellistarn ellistarn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get one more approver, but I'm good to go.

@BeardyBear
Copy link

Hi, how long would it take for this to be released as part of new Karpenter version?
Is there a way I could run it now or should I just wait for the release?
We really need this feature.

Thank you :)

@tzneal tzneal force-pushed the add-affinity-support branch from 0036137 to 68e9030 Compare April 12, 2022 19:03
tzneal added 3 commits April 12, 2022 16:33
- implement affinity/anti-affinity
- rework topology spread support
tzneal and others added 21 commits April 12, 2022 16:33
Node affinity more than likely prevents scheduling on a provisioner,
so remove it first.  This prevents the current selection process from
removing several other preferred terms before removing the one that
is preventing selection.
For anti-affinities we need to block out every possible domain.
Previously topology spread didn't work with match expressions, we had
no tests to cover this case.  The operators have different string values
so just casting types isn't correct.
In this scenario, we can only schedule to the min domain.  We also
rework the requirement collapsing code to cause the collapsing to
occur during topology domain selection
We only count nodes that match the pod node required affinities.
We were carrying around tons of duplicate requirements.  The requirement
Add() function had to process these every time it added.  When this
occurred the set based requirements would narrow down, but the node selector
version would just keep appending possibly huge requirements to the list.
@tzneal tzneal force-pushed the add-affinity-support branch from 68e9030 to 08ac9ec Compare April 12, 2022 21:33
@tzneal
Copy link
Contributor Author

tzneal commented Apr 13, 2022

Hi, how long would it take for this to be released as part of new Karpenter version? Is there a way I could run it now or should I just wait for the release? We really need this feature.

Thank you :)

We plan to make a snapshot image available soon so it can be tested out before the next release.

@tzneal tzneal merged commit befc00c into aws:main Apr 13, 2022
@tzneal tzneal deleted the add-affinity-support branch April 13, 2022 12:40
@joebowbeer
Copy link
Contributor

Also closes #985 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

podAntiAffinity
5 participants