Skip to content
This repository has been archived by the owner on Dec 4, 2024. It is now read-only.

Configure Traefik to improve availability #292

Merged
merged 7 commits into from
Jun 17, 2020
Merged

Conversation

branden
Copy link
Contributor

@branden branden commented Jun 15, 2020

What type of PR is this?

Feature

What this PR does/ why we need it:

This configures the Traefik helm chart to better ensure Traefik's availability. This PR adds a PodDisruptionBudget to ensure at least 1 Traefik pod is running at all times, and distributes Traefik pods in order to tolerate the failure of a node or zone. The affinity that causes pods to distribute across nodes or zones is declared as a preference, so the pods will colocate on a node or zone if there aren't enough for an even distribution.

This PR also adds a bump to the latest Traefik addon revision that was missing from #286.

Which issue(s) this PR fixes:

https://jira.d2iq.com/browse/D2IQ-67944

Special notes for your reviewer:

I tested this PR by deploying Konvoy with the kubernetes-base-addons configversion set to this branch, and with worker nodes in two different availability zones. After the cluster came up I observed that the traefik pods were running on different nodes in different zones. I then drained all but one worker node, and observed that the traefik pods were rescheduled to that remaining node.

Does this PR introduce a user-facing change?:

\[traefik\] Distribute pods across nodes and zones when possible.
\[traefik\] Set a PodDisruptionBudget to ensure at least 1 pod is running at all times.

Checklist

  • The commit message explains the changes and why are needed.
  • The code builds and passes lint/style checks locally.
  • The relevant subset of integration tests pass locally.
  • The core changes are covered by tests.
  • The documentation is updated where needed.

@branden branden requested a review from a team as a code owner June 15, 2020 16:59
@branden branden self-assigned this Jun 15, 2020
@branden branden force-pushed the branden/traefik-ha branch from f8c58f0 to d33f7eb Compare June 15, 2020 17:46
@samvantran samvantran requested review from samba and GoelDeepak June 15, 2020 19:29
Copy link

@samba samba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks like a great start! A few details need some attention.

values.chart.helm.kubeaddons.mesosphere.io/traefik: "https://raw.githubusercontent.com/mesosphere/charts/00b019ef3610ca8221a8cf283b4d7046a50702c4/staging/traefik/values.yaml"
spec:
kubernetes:
minSupportedVersion: v1.15.6
Copy link

@samba samba Jun 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The particular labels used for topologyKey, specifically topology.kubernetes.io/zone are recent, supported only in 1.17 and newer. Probably this minSupportedVersion should also be v1.17.0 or so.

If we aim to support this on previous releases, we should revise the earlier addon revisions to match the appropriate Kubernetes version with applicable labels.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well if a tool doesnt use them for earlier versions, does this harm the deployment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested this on k8s < 1.17, but the k8s docs have a vague warning against using a topologyKey that isn't present in all node labels: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity

Pod anti-affinity requires nodes to be consistently labelled, in other words every node in the cluster must have an appropriate label matching topologyKey. If some or all nodes are missing the specified topologyKey label, it can lead to unintended behavior.

So I guess we should avoid that.

I think the simplest way to resolve this without making changes to supported k8s versions is to use the deprecated zone label, which is still present in 1.17. https://kubernetes.io/docs/reference/kubernetes-api/labels-annotations-taints/#failure-domainbetakubernetesiozone

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern with going backward to (now deprecated) failure-domain labels is the fact that:

  • Kubernetes 1.17 is now 2 months GA
  • Kubernetes 1.19 will be released in August, at which time 1.16 support basically ends.

As such, after August I don't think there will be any supported versions of Kubernetes which do not support topology.kubernetes.io/zone.

Why aren't we testing against multiple Kubernetes versions?

Isn't the design of Addon.spec.kubernetes.minSupportedVersion specifically to allow selection of compatible addons?

Why would we prefer to fall back instead of lean forward?

cc @joejulian

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think support for that ever made it into catalog and nothing but kommander even uses catalog. We could make that part of 1.7 and upgrade from the deprecated configuration at that time but today I think this is the only option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also:

  • we haven't been keeping up with k8s releases
  • we need to support eks which is always behind k8s support
  • I believe you said that 1.16 is going to be the lts version, so we would need to maintain support for that anyway.

Copy link

@samba samba Jun 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think for the immediate objectives, the existing (deprecated) label is viable.
This means that we're going to release a GA version with known-deprecated labels, which likely won't work with the next release, and therefore the addon's use of these labels MUST be changed with 1.18 support, in the next 1.6 beta konvoy release. (This is tech debt.)

The particular versions that we will support with LTS is undecided at this time, however it will be at earliest Kubernetes 1.16. Customer contract details dictate that it will be decided at the time the customer pays for LTS support based on one of the versions we support. It's possible that, under some future conditions, we'd offer extended support for Kubernetes that are already beyond their community support timeframe (i.e. EOL by community).

Community Kubernetes is working to establish an N-3 support policy (extending from the previous N-2) taking effect with Kubernetes 1.19, which would include 1.16 -- effectively extending community support to roughly 1 year (depending on release cadence).

In the near future, we need to:

  • Start testing all supported Kubernetes on associated KBA branches.
  • Start testing newer (unsupported) Kubernetes in unstable KBA branches, so we can know much earlier where there will be problems.
  • Start using the Catalog functionality in kubeaddons, as appropriate, to select compatible addon revisions automatically.
  • Start versioning addons to support appropriate capabilities for various Kubernetes releases, keeping current with the versions we support and their APIs/labels/features/etc.

Relatedly, we can probably stop supporting things for 1.15 when the KBA release tag says stable-1.17-... or some variant thereof. Probably we should start pruning the revisions with each release.

Comment on lines 85 to 66
# TODO: This comment is no longer true.
# dex service is exposed with TLS certificate signed by self signed root
# Dex CA certificate. It is not clear if traefik supports configuring
# trusted certificates per backend. This should be investiaged in a
# separate issue.
# See: https://jira.mesosphere.com/browse/DCOS-56033
insecureSkipVerify: true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this comment is no longer true, should it not be removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a separate ticket for the item. I suggest another PR.

Comment on lines 34 to 52
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchExpressions:
- key: release
operator: In
values:
- traefik-kubeaddons
topologyKey: kubernetes.io/hostname
- weight: 1
podAffinityTerm:
labelSelector:
matchExpressions:
- key: release
operator: In
values:
- traefik-kubeaddons
topologyKey: topology.kubernetes.io/zone
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we should not use the release label, in this case, because it's controlled and applied implicitly by Helm. (right?)

This means our correct behavior (HA, upgradability, and tolerance to AZ failure) is predicated on the expected behavior of an external tool (i.e. outside the scope of the addon). We should try to minimize implicit dependencies beyond the scope of an addon's definition.

The chart appears to offer deployment.podLabels: {} that would allow us to specify a distinct label here, controlled by the addon definition itself. I'd suggest creating a distinct pod label, controlled directly in this addon YAML, and using it in the match expression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The release label i think is fine in general, but yeah I see your point where this isnt necessarily managed by us, which would then mean we probably could/would want to add our own labels to make it clear our intention regardless of the tool we are using for deployment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I wasn't thrilled to use release but it was the only workable option among the existing set of labels on Traefik's pods. @samba Good call on deployment.podLabels, I'll set something there that we can reference for pod affinity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and gave the pods kubeaddons.mesosphere.io/name labels in order to avoid inventing my own label: be789c5

@joejulian
Copy link
Contributor

A previous PR used traefik-11.yaml. Please rebase from that.

Copy link
Contributor

@joejulian joejulian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per comment, rebase required

branden added 6 commits June 15, 2020 17:00
This adds a missing bump to a prior revision of the Traefik addon.
This label is deprecated in k8s 1.17, but is supported on older versions
that are supported by this addon.
@branden branden force-pushed the branden/traefik-ha branch from d33f7eb to be789c5 Compare June 16, 2020 00:19
@branden branden requested review from joejulian and samba June 16, 2020 00:22
Copy link
Contributor

@makkes makkes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be hypothetical but what would happen if the affinity can't be satisfied (e.g. because I've deployed Konvoy in just one zone or only have one worker node)? I suppose the deployment will just never get available.

@branden
Copy link
Contributor Author

branden commented Jun 16, 2020

@makkes The affinity is declared as a preference, so the pods will colocate on a node or zone if there aren't enough for an even distribution. I tested to make sure that works.

@makkes
Copy link
Contributor

makkes commented Jun 16, 2020

@makkes The affinity is declared as a preference, so the pods will colocate on a node or zone if there aren't enough for an even distribution. I tested to make sure that works.

That's quite a nice feature. Thanks for clarifying!

Copy link

@samba samba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thorough consideration I think we can release this w/ Kubernetes 1.17, in Konvoy 1.5 GA, even with the deprecated labels. This accepts some tech debt, which we should address in the near future.

@branden branden merged commit e70b53a into master Jun 17, 2020
@branden branden deleted the branden/traefik-ha branch June 17, 2020 00:48
joejulian pushed a commit that referenced this pull request Jun 19, 2020
* Bump traefik addon revision

This adds a missing bump to a prior revision of the Traefik addon.

* Add new traefik addon revision

* Bump traefik addon revision

* Configure Traefik for HA

* Use deprecated zone label

This label is deprecated in k8s 1.17, but is supported on older versions
that are supported by this addon.

* Set custom label to use for affinity
joejulian pushed a commit that referenced this pull request Jul 8, 2020
* Bump traefik addon revision

This adds a missing bump to a prior revision of the Traefik addon.

* Add new traefik addon revision

* Bump traefik addon revision

* Configure Traefik for HA

* Use deprecated zone label

This label is deprecated in k8s 1.17, but is supported on older versions
that are supported by this addon.

* Set custom label to use for affinity
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants