Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement topology spread constraints #8168

Closed
jcogilvie opened this issue Mar 29, 2022 · 9 comments
Closed

Implement topology spread constraints #8168

jcogilvie opened this issue Mar 29, 2022 · 9 comments
Labels
Milestone

Comments

@jcogilvie
Copy link
Contributor

jcogilvie commented Mar 29, 2022

What problem are you trying to solve?

I want a high-availability linkerd deployment that doesn't break if I want more replicas than I have nodes.

How should the problem be solved?

I noticed that presently the linkerd-ha values file uses node anti-affinity to represent a desire not to co-locate on the same host with other linkerd pods. But I believe what we really mean isn't "we can't have more than one per host" so much as "they must be distributed across hosts."

I believe this desire can be more accurately represented by enabling topologySpreadConstraints instead of antiAffinity. Instead of saying "don't schedule with another like pod," we could say "spread evenly between hosts" for the same kind of availability but with more flexibility and less brittleness (i.e. it won't fail if you have a cluster of 3 nodes and desire 4 linkerd controller replicas).

Any alternatives you've considered?

The currently implemented antiAffinity does prevent pods from stacking on the same nodes, but it suffers from robustness concerns if you want a number of replicas >= the number of nodes in the cluster. (For instance, in our environment we have few but very large nodes.) Under that condition, antiAffinity rules become unsatisfiable and deployments (and upgrades) fail.

How would users interact with this feature?

A flag in the helm chart, parallel to enableAntiAffinity, called enableTopologySpread, would cause topology spread constraints to be employed.

n.b. this is only supported on kubernetes 1.19+ so we may have to take version into consideration.

Would you like to work on this feature?

maybe

@jcogilvie
Copy link
Contributor Author

Aside: this thought was prompted by this karpenter issue. Its scheduler understands topology spread, but has trouble with anti-affinity, and things with anti-affinity don't presently get scheduled.

@adleong adleong added this to the stable-2.12.0 milestone Apr 12, 2022
@adleong
Copy link
Member

adleong commented Apr 12, 2022

This sounds like a great idea. topologySpreadConstraints does sound like a better fit than antiAffinity for this.

@olix0r
Copy link
Member

olix0r commented Apr 12, 2022

n.b. this is only supported on kubernetes 1.19+ so we may have to take version into consideration.

I think the MSKV is already at 1.20 on edge, so this should be fine to rely on.

@kleimkuhler
Copy link
Contributor

I think it makes sense to just replace antiAffinity with toplogySpreadConstraints rather than introducing additional configuration to use one over the other. Our MSKV supports it, and it more accurately represents what HA should do when working with a limited number of nodes, so it should just be replaced in all the templates.

@adleong adleong self-assigned this Jun 30, 2022
@adleong adleong added priority/triage priority/P2 Nice-to-have for Release and removed priority/triage labels Jun 30, 2022
@adleong adleong removed their assignment Jul 1, 2022
mikutas added a commit to mikutas/linkerd2 that referenced this issue Jul 7, 2022
mikutas added a commit to mikutas/linkerd2 that referenced this issue Jul 7, 2022
@adleong
Copy link
Member

adleong commented Jul 14, 2022

In #8826 we discovered that TopologySpreadConstraints have some unfortunate downsides. In particular, that the constraints are not continuously enforced after resource creation.

Given that, I wanted to go back to the drawing board and ask some basic questions about this use case. Why is it desirable to run more control plane replicas than nodes? HA mode is generally not recommended for clusters with a small number of nodes.

@jcogilvie
Copy link
Contributor Author

Why is it desirable to run more control plane replicas than nodes?

The case where we have few very large nodes is sometimes prompted by licensing agreements for cluster software that charge per node.

But the issue was prompted by limitations (at the time) of karpenter:

this thought was prompted by this karpenter issue. Its scheduler understands topology spread, but has trouble with anti-affinity, and things with anti-affinity don't presently get scheduled.

That issue, combined with the economic considerations above, and the fact that topologySpreadConstraints seemed like a better semantic fit for what we're actually intending to declare, is what led me to open the issue.

@adleong
Copy link
Member

adleong commented Jul 14, 2022

In that situation, I would recommend not running Linkerd in HA mode. The goal of HA mode is to create a configuration that is highly resilient to node failure, and when you're running with few very large nodes, it's not really possible to provide those guarantees.

On paper, topologySpreadConstraints did sound like a better semantic fit, but after seeing the way that it's implemented, I don't actually think it's such a great fit.

@jcogilvie
Copy link
Contributor Author

I think what I'm reading is that topologySpreadConstraints and antiAffinity aren't mutually exclusive. Why not add chart support for both and let customers decide how they fit together?

@adleong
Copy link
Member

adleong commented Jul 14, 2022

To the extent possible, I'm trying to avoid the Linkerd charts becoming a grab bag of options and limit them to a set of configurations that we know make sense and can endorse. For users who want to blaze their own path, there are a wealth of options including using Kustomize, forking the Linkerd charts, or crafting your own manifests some other way.

@adleong adleong closed this as completed Jul 21, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants