Replace podAntiAffinity addon by a topologySpreadConstraints addon? #1537

maxbrunet · 2021-12-06T20:45:00Z

What is missing?

I stumbled upon this new recommendation reading the karpenter.sh documentation¹:

Note
Don’t use podAffinity and podAntiAffinity to schedule pods on the same or different nodes as other pods. Kubernetes SIG scalability recommends against these features and Karpenter doesn’t support them. Instead, the Karpenter project recommends topologySpreadConstraints to reduce blast radius and nodeSelectors and taints to implement colocation.

More in the official documentation about the motivations: https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/#comparison-with-podaffinity-podantiaffinity

It may not be worth to pursue improvements to the podAntiAffinity addon like #1090.

Thoughts?

Why do we need it?

Follow ever evolving Kubernetes best practices

Environment

kube-prometheus version:

Insert Git SHA here

Anything else we need to know?:

anti-affinity addon: https://github.com/prometheus-operator/kube-prometheus/blob/6d013d4e4f980ba99cfdafa9432819d484e2f829/jsonnet/kube-prometheus/addons/anti-affinity.libsonnet

https://karpenter.sh/docs/concepts/ ↩

The text was updated successfully, but these errors were encountered:

paulfantom · 2021-12-07T11:11:23Z

Can you share more about sig scalability requirements about this? I cannot find anything much info related to use of affinity vs topologySpreadConstraints. The only thing I found is KEP for introducing the feature and info that affinity and topologySpreadConstraints can work together.

From what I see topologySpreadConstraints matter more when you want to have multiple constraints using multiple failure domains (something not easily possible with affinity). However, in our case, we are using only one failure domain (node) which begs to ask what is the benefit of switching to a different solution?

Don't get me wrong, I like the idea of using topologySpreadConstraints, but I just want to know more ;)

maxbrunet · 2021-12-07T23:39:02Z

I am at the same point as you on all that. Reading the KEP further, PodAntiAffinity seem to cause them issues and they are even thinking about deprecating it in the linked issue:

Currently PodAntiAffinity supports arbitrary topology domain, but sadly this causes a slow down in scheduling (see Rethink pod affinity/anti-affinity). We're evaluating solutions such as limit topology domain to node, or internally implement a fast/slow path handling that. If this KEP gets implemented, we can simply achieve the semantics of "PodAntiAffinity in zones" via a combination of "Even pods spreading in zones" plus "PodAntiAffinity in nodes" which could be an extra benefit of this KEP.

From this, it seems to be more about how it works internally, rather than the potential gain in flexibility for the users. My guess is, when the Karpenter documentation refers to the "Kubernetes SIG scalability" recommendation, it is not about the scalability of your deployment, but the scalability of your Kubernetes Scheduler

philipgough · 2021-12-08T09:02:39Z

they are even thinking about deprecating it in the linked issue

I'm not sure I see anything concrete in that issue which states that there will be any deprecation to affinity or anti-affinity. It is an almost 3 year old issue.

In fact I see various related enhancements landing in 1.22 kubernetes/enhancements#2249

To be clear, I am not saying we shouldn't consider this or it is a bad idea. I just want to be sure we are not being motivated by the wrong source.

maxbrunet · 2021-12-09T00:11:50Z

I got the Karpenter documentation clarified aws/karpenter-provider-aws#948

So in short, pod anti-affinities are a strain for the Kubernetes Scheduler at the moment
They does not seem to be a clear decision on whatever they will be deprecated or improved, but topologySpreadConstraints are here as an alternative

I have finally found the literal recommendation

Note: Inter-pod affinity and anti-affinity require substantial amount of processing which can slow down scheduling in large clusters significantly. We do not recommend using them in clusters larger than several hundred nodes.

https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity

Now the questions for kube-prometheus:

Are we concerned about the Kubernetes Scheduler performance?
Do we want a topologySpreadConstraints addon?
Do we want to remove the podAntiAffinity addon if doing 2.?

paulfantom · 2021-12-09T09:32:33Z

Are we concerned about the Kubernetes Scheduler performance?

I would say yes. If we can improve our users' environments by using a better performing feature, we should use it.

Do we want a topologySpreadConstraints addon?

I vote yes.

Do we want to remove the podAntiAffinity addon if doing 2.?

No, I think this would break too many workflows. We can however put a trace notice in the podAntiAffinity addon to advertise the usage of the new topologySpreadConstraints addon. Additionally, we would need to create better documentation about topology spread and usage of podAntiAffinity as well as topologySpreadConstraints with an explanation on why not to use podAntiAffinity.

By trace notice I mean something like this: https://github.com/prometheus-operator/kube-prometheus/compare/main...paulfantom:trace-notice?expand=1

cc @simonpasquier @philipgough you might be interested in this as AFAIR CMO is using anti-affinity heavily.

philipgough · 2021-12-10T09:56:44Z

I have no issues with us adding support for the addon while retaining support for inter pod affinity.

Yes CMO is using podAntiAffinity for both AlertManager and the platform Prometheus. I know we have deployed on very large cluster but am not aware of any issues with scheduling performance.

I'd be curious to learn if the impact would be more substantial for high-churn workloads, as opposed to relatively static platform infra. Or if adding podAntiAffinity, podAffinity to a workload affects overall scheduling performance across the cluster in general or is just limited to those workloads.

maxbrunet · 2021-12-11T01:00:04Z

@paulfantom sounds good, thank you for pointing to std.trace(), I had been wondering how to output things like deprecation warning in Jsonnet for some time.

I am going to experiment with topologySpreadConstraints in my clusters, and I will try to contribute an addon later

migueleliasweb · 2023-05-12T22:52:45Z

This seems already implemented. https://prometheus-operator.dev/docs/operator/api/

Should this issue be closed?

joebowbeer · 2023-05-12T23:01:41Z

Karpenter implemented enough Pod Anti-Affinity support to satisfy the prometheus use case. It was released in v0.9.0

simonpasquier · 2023-05-16T08:51:35Z

@migueleliasweb I think that this feature request is still valid because the ask is to implement a jsonnet addon that uses topologySpreadConstraints and this doesn't exist yet (though the prometheus-operator CRDs support it).

maxbrunet added the kind/feature label Dec 6, 2021

joebowbeer mentioned this issue Dec 8, 2021

podAntiAffinity aws/karpenter-provider-aws#942

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace podAntiAffinity addon by a topologySpreadConstraints addon? #1537

Replace podAntiAffinity addon by a topologySpreadConstraints addon? #1537

maxbrunet commented Dec 6, 2021

paulfantom commented Dec 7, 2021

maxbrunet commented Dec 7, 2021

philipgough commented Dec 8, 2021 •

edited

Loading

maxbrunet commented Dec 9, 2021

paulfantom commented Dec 9, 2021

philipgough commented Dec 10, 2021

maxbrunet commented Dec 11, 2021

migueleliasweb commented May 12, 2023

joebowbeer commented May 12, 2023

simonpasquier commented May 16, 2023

Replace podAntiAffinity addon by a topologySpreadConstraints addon? #1537

Replace podAntiAffinity addon by a topologySpreadConstraints addon? #1537

Comments

maxbrunet commented Dec 6, 2021

Footnotes

paulfantom commented Dec 7, 2021

maxbrunet commented Dec 7, 2021

philipgough commented Dec 8, 2021 • edited Loading

maxbrunet commented Dec 9, 2021

paulfantom commented Dec 9, 2021

philipgough commented Dec 10, 2021

maxbrunet commented Dec 11, 2021

migueleliasweb commented May 12, 2023

joebowbeer commented May 12, 2023

simonpasquier commented May 16, 2023

philipgough commented Dec 8, 2021 •

edited

Loading