Replace PodAntiAffinity with TopologySpreadConstraints #8826

mikutas · 2022-07-07T11:14:25Z

Signed-off-by: Takumi Sue [email protected]

Fixes linkerd#8168 Signed-off-by: Takumi Sue <[email protected]>

Signed-off-by: Takumi Sue <[email protected]>

adleong · 2022-07-07T18:17:12Z

charts/partials/templates/_affinity.tpl

-        - {{ .component }}
-    topologyKey: kubernetes.io/hostname
-{{- end }}
-
 {{ define "linkerd.node-affinity" -}}
 nodeAffinity:
 {{- toYaml .Values.nodeAffinity | trim | nindent 2 }}
 {{- end }}

 {{ define "linkerd.affinity" -}}


Since node-affinity is now the only thing in this define, I think we can just inline the node-affinity define here to simplify.

adleong · 2022-07-07T18:33:49Z

multicluster/values/values.go

+	RemoteMirrorServiceAccount      bool     `json:"remoteMirrorServiceAccount"`
+	RemoteMirrorServiceAccountName  string   `json:"remoteMirrorServiceAccountName"`
+	TargetClusterName               string   `json:"targetClusterName"`
+	EnableTopologySpreadConstraints bool     `json:"enableTopologySpreadConstraints"`


I think in order to safely remove the EnablePodAntiAffinity value and replace it with EnableTopologySpreadConstraints we'd need to go through a deprecation cycle where we temporarily support both values. It's a bit unfortunate because both values have the same intent (to spread control plane pods across nodes) even if what they actually do is slightly different.

What do you think about keeping the EnablePodAntiAffinity value but just having it enable topology spread instead of pod anti-affinity? This would mean the name of the value is a bit of a lie, but it avoids the need for users to migrate from one value to the other.

@alpeb do you have any thoughts on this?

I'd rather not change the meaning of a setting like that as it may confuse users ("why do I have multiple replicas in the same node even if I had enabled pod anti-affinity?"). As for the deprecation cycle, IMO it's enough to call this out in the upgrade notes; 2.12 will require to manually migrate helm configs anyways, so we might as well include this change as well.

Sounds good. Will we run into any serialization/deserialization errors during upgrade because the Values struct is changing?

I tested this upgrading using both the CLI and Helm, and config seems to behave well. Old enablePodAffinity remains in the CM, but that should be fine.

Signed-off-by: Takumi Sue <[email protected]>

alpeb · 2022-07-08T18:13:02Z

charts/partials/templates/_topologyspreadconstraints.tpl

+{{- if .Values.enableTopologySpreadConstraints }}
+topologySpreadConstraints:
+- maxSkew: 1
+  topologyKey: failure-domain.beta.kubernetes.io/zone


This label has been declared deprecated. Can you use topology.kubernetes.io/zone instead?

alpeb · 2022-07-08T18:13:06Z

charts/partials/templates/_topologyspreadconstraints.tpl

+    matchExpressions:
+    - key: {{ default "linkerd.io/control-plane-component" .label }}
+      operator: In
+      values:
+      - {{ .component }}


I think this can be simplified to this (same applies to rule below):

Suggested change

matchExpressions:

- key: {{ default "linkerd.io/control-plane-component" .label }}

operator: In

values:

- {{ .component }}

matchLabels:

- {{ default "linkerd.io/control-plane-component" .label }}: {{.component}}

alpeb · 2022-07-08T18:40:26Z

multicluster/values/values.go

+	RemoteMirrorServiceAccount      bool     `json:"remoteMirrorServiceAccount"`
+	RemoteMirrorServiceAccountName  string   `json:"remoteMirrorServiceAccountName"`
+	TargetClusterName               string   `json:"targetClusterName"`
+	EnableTopologySpreadConstraints bool     `json:"enableTopologySpreadConstraints"`


I tested this upgrading using both the CLI and Helm, and config seems to behave well. Old enablePodAffinity remains in the CM, but that should be fine.

alpeb

Thanks @mikutas for the thorough changes. From my testing using --ha I see pods are well distributed among hosts when installing. And they don't get blocked if the number of nodes is inferior to the number of desired replicas, unlike when using pod anti-affinity. However when upgrading from the latest edge to this branch, they usually don't get well distributed, unlike they did with the previous approach. Any ideas how we could improve that?

mikutas · 2022-07-09T03:01:43Z

Hmm can we recommend users to use descheduler if necessary in upgrade note? (thats all I have for now 💭 ) @alpeb

alpeb · 2022-07-11T17:00:19Z

@mikutas Ok I see in the docs that TopologySpreadConstraints is really intended only for when the pods get created, and as things evolve there's no guarantee the constraints continue to be respected and so they recommend coupling with the Descheduler as you point out, which is a bummer 🤷‍♂️

There's no guarantee that the constraints remain satisfied when Pods are removed. For example, scaling down a Deployment may result in imbalanced Pods distribution. You can use Descheduler to rebalance the Pods distribution.

@jeremychase @adleong WDYT? I think this new approach is still better because it avoids locking things down as in the scenario described in #8168, but it can be surprising it will stop being relevant after upgrades, so we should at least document that in the Helm config. And for people requiring a more strict spreading policy, well they can put up with the extra work.

adleong · 2022-07-12T17:45:02Z

That's a disappointing wrinkle that TopologySpreadConstraints don't work like we expected. Given that, I'd be inclined to stick with PodAntiAffinity since that gives us better guarantees that control plane pods will be spread between nodes and that the control plane will be resilient to node failure.

I realize that this doesn't address #8168 but it's unclear to me why one would want more control plane replicas than nodes. One replica should be more than enough to handle a node's worth of pods and HA mode isn't really appropriate for clusters with a very small number of nodes anyway.

jeremychase · 2022-07-12T18:47:57Z

charts/partials/templates/_topologyspreadconstraints.tpl

+topologySpreadConstraints:
+- maxSkew: 1
+  topologyKey: failure-domain.beta.kubernetes.io/zone
+  whenUnsatisfiable: ScheduleAnyway


@alpeb @adleong

I'm inclined to think that if we switched this to DoNotSchedule the unexpected behavior may not be a problem.

For this to work we would need to ensure that the pods get scheduled after the condition is satisfied again.

Thoughts?

According to my testing, if that label is not there (which is the case in k3d for example), switching that will cause pods to not being scheduled at all. Removing that rule and leaving only the one for kubernetes.io/hostname still causes pods to not being allocated as expected during upgrades unfortunately.

failure-domain.beta.kubernetes.io/zone is deprecated Signed-off-by: Takumi Sue <[email protected]>

Signed-off-by: Takumi Sue <[email protected]>

Replace PodAntiAffinity with TopologySpreadConstraints

a7a6c3b

Fixes linkerd#8168 Signed-off-by: Takumi Sue <[email protected]>

mikutas force-pushed the 8168 branch from 4e5b861 to a7a6c3b Compare July 7, 2022 11:16

Fix tests

c57df40

Signed-off-by: Takumi Sue <[email protected]>

mikutas force-pushed the 8168 branch from b656044 to c57df40 Compare July 7, 2022 13:01

mikutas marked this pull request as ready for review July 7, 2022 13:22

mikutas requested a review from a team as a code owner July 7, 2022 13:22

adleong reviewed Jul 7, 2022

View reviewed changes

Remove "linkerd.node-affinity" template

58ff6fb

Signed-off-by: Takumi Sue <[email protected]>

alpeb reviewed Jul 8, 2022

View reviewed changes

jeremychase reviewed Jul 12, 2022

View reviewed changes

mikutas added 2 commits July 13, 2022 14:53

Replace topologyKey label

b30bf02

failure-domain.beta.kubernetes.io/zone is deprecated Signed-off-by: Takumi Sue <[email protected]>

Replace matchExpressions into matchLabels

30e5e10

Signed-off-by: Takumi Sue <[email protected]>

adleong mentioned this pull request Jul 14, 2022

Implement topology spread constraints #8168

Closed

adleong closed this Jul 21, 2022

mikutas deleted the 8168 branch September 18, 2023 01:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace PodAntiAffinity with TopologySpreadConstraints #8826

Replace PodAntiAffinity with TopologySpreadConstraints #8826

mikutas commented Jul 7, 2022

adleong Jul 7, 2022

adleong Jul 7, 2022

alpeb Jul 7, 2022

adleong Jul 7, 2022

alpeb Jul 8, 2022

alpeb Jul 8, 2022

alpeb Jul 8, 2022

alpeb Jul 8, 2022

alpeb left a comment

mikutas commented Jul 9, 2022

alpeb commented Jul 11, 2022

adleong commented Jul 12, 2022 •

edited

Loading

jeremychase Jul 12, 2022

alpeb Jul 12, 2022

Replace PodAntiAffinity with TopologySpreadConstraints #8826

Replace PodAntiAffinity with TopologySpreadConstraints #8826

Conversation

mikutas commented Jul 7, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alpeb left a comment

Choose a reason for hiding this comment

mikutas commented Jul 9, 2022

alpeb commented Jul 11, 2022

adleong commented Jul 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adleong commented Jul 12, 2022 •

edited

Loading