-
Notifications
You must be signed in to change notification settings - Fork 21
Configure Traefik to improve availability #292
Conversation
f8c58f0
to
d33f7eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks like a great start! A few details need some attention.
addons/traefik/1.7.x/traefik-11.yaml
Outdated
values.chart.helm.kubeaddons.mesosphere.io/traefik: "https://raw.githubusercontent.com/mesosphere/charts/00b019ef3610ca8221a8cf283b4d7046a50702c4/staging/traefik/values.yaml" | ||
spec: | ||
kubernetes: | ||
minSupportedVersion: v1.15.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The particular labels used for topologyKey
, specifically topology.kubernetes.io/zone
are recent, supported only in 1.17 and newer. Probably this minSupportedVersion
should also be v1.17.0
or so.
If we aim to support this on previous releases, we should revise the earlier addon revisions to match the appropriate Kubernetes version with applicable labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well if a tool doesnt use them for earlier versions, does this harm the deployment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested this on k8s < 1.17, but the k8s docs have a vague warning against using a topologyKey
that isn't present in all node labels: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity
Pod anti-affinity requires nodes to be consistently labelled, in other words every node in the cluster must have an appropriate label matching topologyKey. If some or all nodes are missing the specified topologyKey label, it can lead to unintended behavior.
So I guess we should avoid that.
I think the simplest way to resolve this without making changes to supported k8s versions is to use the deprecated zone label, which is still present in 1.17. https://kubernetes.io/docs/reference/kubernetes-api/labels-annotations-taints/#failure-domainbetakubernetesiozone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern with going backward to (now deprecated) failure-domain labels is the fact that:
- Kubernetes 1.17 is now 2 months GA
- Kubernetes 1.19 will be released in August, at which time 1.16 support basically ends.
As such, after August I don't think there will be any supported versions of Kubernetes which do not support topology.kubernetes.io/zone
.
Why aren't we testing against multiple Kubernetes versions?
Isn't the design of Addon.spec.kubernetes.minSupportedVersion
specifically to allow selection of compatible addons?
Why would we prefer to fall back instead of lean forward?
cc @joejulian
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think support for that ever made it into catalog and nothing but kommander even uses catalog. We could make that part of 1.7 and upgrade from the deprecated configuration at that time but today I think this is the only option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also:
- we haven't been keeping up with k8s releases
- we need to support eks which is always behind k8s support
- I believe you said that 1.16 is going to be the lts version, so we would need to maintain support for that anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think for the immediate objectives, the existing (deprecated) label is viable.
This means that we're going to release a GA version with known-deprecated labels, which likely won't work with the next release, and therefore the addon's use of these labels MUST be changed with 1.18 support, in the next 1.6 beta konvoy release. (This is tech debt.)
The particular versions that we will support with LTS is undecided at this time, however it will be at earliest Kubernetes 1.16. Customer contract details dictate that it will be decided at the time the customer pays for LTS support based on one of the versions we support. It's possible that, under some future conditions, we'd offer extended support for Kubernetes that are already beyond their community support timeframe (i.e. EOL by community).
Community Kubernetes is working to establish an N-3 support policy (extending from the previous N-2) taking effect with Kubernetes 1.19, which would include 1.16 -- effectively extending community support to roughly 1 year (depending on release cadence).
In the near future, we need to:
- Start testing all supported Kubernetes on associated KBA branches.
- Start testing newer (unsupported) Kubernetes in unstable KBA branches, so we can know much earlier where there will be problems.
- Start using the Catalog functionality in kubeaddons, as appropriate, to select compatible addon revisions automatically.
- Start versioning addons to support appropriate capabilities for various Kubernetes releases, keeping current with the versions we support and their APIs/labels/features/etc.
Relatedly, we can probably stop supporting things for 1.15 when the KBA release tag says stable-1.17-...
or some variant thereof. Probably we should start pruning the revisions with each release.
addons/traefik/1.7.x/traefik-11.yaml
Outdated
# TODO: This comment is no longer true. | ||
# dex service is exposed with TLS certificate signed by self signed root | ||
# Dex CA certificate. It is not clear if traefik supports configuring | ||
# trusted certificates per backend. This should be investiaged in a | ||
# separate issue. | ||
# See: https://jira.mesosphere.com/browse/DCOS-56033 | ||
insecureSkipVerify: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this comment is no longer true, should it not be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a separate ticket for the item. I suggest another PR.
addons/traefik/1.7.x/traefik-11.yaml
Outdated
preferredDuringSchedulingIgnoredDuringExecution: | ||
- weight: 1 | ||
podAffinityTerm: | ||
labelSelector: | ||
matchExpressions: | ||
- key: release | ||
operator: In | ||
values: | ||
- traefik-kubeaddons | ||
topologyKey: kubernetes.io/hostname | ||
- weight: 1 | ||
podAffinityTerm: | ||
labelSelector: | ||
matchExpressions: | ||
- key: release | ||
operator: In | ||
values: | ||
- traefik-kubeaddons | ||
topologyKey: topology.kubernetes.io/zone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we should not use the release
label, in this case, because it's controlled and applied implicitly by Helm. (right?)
This means our correct behavior (HA, upgradability, and tolerance to AZ failure) is predicated on the expected behavior of an external tool (i.e. outside the scope of the addon). We should try to minimize implicit dependencies beyond the scope of an addon's definition.
The chart appears to offer deployment.podLabels: {}
that would allow us to specify a distinct label here, controlled by the addon definition itself. I'd suggest creating a distinct pod label, controlled directly in this addon YAML, and using it in the match expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The release label i think is fine in general, but yeah I see your point where this isnt necessarily managed by us, which would then mean we probably could/would want to add our own labels to make it clear our intention regardless of the tool we are using for deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I wasn't thrilled to use release
but it was the only workable option among the existing set of labels on Traefik's pods. @samba Good call on deployment.podLabels
, I'll set something there that we can reference for pod affinity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and gave the pods kubeaddons.mesosphere.io/name
labels in order to avoid inventing my own label: be789c5
A previous PR used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per comment, rebase required
This adds a missing bump to a prior revision of the Traefik addon.
This label is deprecated in k8s 1.17, but is supported on older versions that are supported by this addon.
d33f7eb
to
be789c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be hypothetical but what would happen if the affinity can't be satisfied (e.g. because I've deployed Konvoy in just one zone or only have one worker node)? I suppose the deployment will just never get available.
@makkes The affinity is declared as a preference, so the pods will colocate on a node or zone if there aren't enough for an even distribution. I tested to make sure that works. |
That's quite a nice feature. Thanks for clarifying! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thorough consideration I think we can release this w/ Kubernetes 1.17, in Konvoy 1.5 GA, even with the deprecated labels. This accepts some tech debt, which we should address in the near future.
* Bump traefik addon revision This adds a missing bump to a prior revision of the Traefik addon. * Add new traefik addon revision * Bump traefik addon revision * Configure Traefik for HA * Use deprecated zone label This label is deprecated in k8s 1.17, but is supported on older versions that are supported by this addon. * Set custom label to use for affinity
* Bump traefik addon revision This adds a missing bump to a prior revision of the Traefik addon. * Add new traefik addon revision * Bump traefik addon revision * Configure Traefik for HA * Use deprecated zone label This label is deprecated in k8s 1.17, but is supported on older versions that are supported by this addon. * Set custom label to use for affinity
What type of PR is this?
Feature
What this PR does/ why we need it:
This configures the Traefik helm chart to better ensure Traefik's availability. This PR adds a PodDisruptionBudget to ensure at least 1 Traefik pod is running at all times, and distributes Traefik pods in order to tolerate the failure of a node or zone. The affinity that causes pods to distribute across nodes or zones is declared as a preference, so the pods will colocate on a node or zone if there aren't enough for an even distribution.
This PR also adds a bump to the latest Traefik addon revision that was missing from #286.
Which issue(s) this PR fixes:
https://jira.d2iq.com/browse/D2IQ-67944
Special notes for your reviewer:
I tested this PR by deploying Konvoy with the kubernetes-base-addons configversion set to this branch, and with worker nodes in two different availability zones. After the cluster came up I observed that the traefik pods were running on different nodes in different zones. I then drained all but one worker node, and observed that the traefik pods were rescheduled to that remaining node.
Does this PR introduce a user-facing change?:
Checklist