Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a new check for extension namespace configuration #11629

Merged
merged 4 commits into from
Nov 20, 2023

Conversation

mateiidavid
Copy link
Member

@mateiidavid mateiidavid commented Nov 17, 2023

Linkerd's extension model requires that each namespace that "owns" an extension to be labelled with the extension name. Core extensions in particular strictly follow this pattern. For example, the namespace viz is installed in would be labelled with linkerd.io/extension=viz.

The extension is used by the CLI in many different instances. It is used in checks, it is used in uninstalls, and so on. Whenever a namespace contains a duplicate label value (e.g. two namespaces are registered as the owner of "viz") we introduce undefined behaviour. Extension checks or uninstalls may or may not work correctly. These issues are not straightforward to debug. Misconfiguration can be introduced due to a variety of reasons.

This change adds a new "core" category (linkerd-extension-checks) and a new checker that asserts all extension namespaces are configured properly. There are two reasons why this has been made a "core" extension:

  • Extensions may have their own health checking library. It is hard to share a common abstraction here without duplicating the logic. For example, viz imports the healthchecking package whereas the multicluster extension has its own. A dedicated core check will work better with all extensions that opt-in to use linkerd's extension label.
  • Being part of the core checks means this is going to run before any of the other extension checks do which might improve visibility.

The change is straightforward; if an extension value is used for the label key more than once across the cluster, the check issues a warning along with the namespaces the label key and value tuple exists on.

This should be followed-up with a docs change.

Closes #11509


Manual QA

Failure:

:; k get ns -l linkerd.io/extension=multicluster
NAME                   STATUS   AGE
linkerd-multicluster   Active   23h
default            

kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

linkerd-webhooks-and-apisvc-tls
-------------------------------
√ proxy-injector webhook has valid cert
√ proxy-injector cert is valid for at least 60 days
√ sp-validator webhook has valid cert
√ sp-validator cert is valid for at least 60 days
√ policy-validator webhook has valid cert
√ policy-validator cert is valid for at least 60 days

linkerd-version
---------------
√ can determine the latest version
‼ cli is up-to-date
    unsupported version channel: dev-78a5ec56-matei
    see https://linkerd.io/2/checks/#l5d-version-cli for hints

...

linkerd-extension-checks
------------------------
‼ namespace configuration for extensions
    some extensions have invalid configuration:
        * label "linkerd.io/extension=multicluster" is present on more than one namespace:
                * linkerd-multicluster
                * default
        * label "linkerd.io/extension=viz" is present on more than one namespace:
                * test
                * test2
    see https://linkerd.io/2/checks/#l5d-extension-namespaces for hints


linkerd-multicluster
--------------------
√ Link CRD exists
√ Link resources are valid
        * target

Status check results are √

Success:

:; k label ns default linkerd.io/extension=m --overwrite
namespace/default labeled

linkerd-extension-checks
------------------------
√ namespace configuration for extensions

linkerd-multicluster
--------------------
√ Link CRD exists
√ Link resources are valid
        * target
√ remote cluster access credentials are valid
        * target
√ clusters share trust anchors
        * target
√ service mirror controller has required permissions
        * target

Alternatives

  • We could fail the check instead of issuing a warning

Linkerd's extension model requires that each namespace that "owns" an
extension to be labelled with the extension name. Core extensions in
particular strictly follow this pattern. For example, the namespace viz
is installed in would be labelled with `linkerd.io/extension=viz`.

The extension is used by the CLI in many different instances. It is used
in checks, it is used in uninstalls, and so on. Whenever a namespace
contains a duplicate label value (e.g. two namespaces are registered as
the owner of "viz") we introduce undefined behaviour. Extension checks
or uninstalls may or may not work correctly. These issues are not
straightforward to debug. Misconfiguration can be introduced due to a
variety of reasons.

This change adds a new "core" category (`linkerd-extension-checks`) and
a new checker that asserts all extension namespaces are configured
properly. There are two reasons why this has been made a "core"
extension:

* Extensions may have their own health checking library. It is hard to
  share a common abstraction here without duplicating the logic. For
  example, viz imports the healthchecking package whereas the
  multicluster extension has its own. A dedicated core check will work
  better with all extensions that opt-in to use linkerd's extension
  label.
* Being part of the core checks means this is going to run before any of
  the other extension checks do which might improve visibility.

The change is straightforward; if an extension value is used for the
label key more than once across the cluster, the check issues a warning
along with the namespaces the label key and value tuple exists on.

This should be followed-up with a docs change.

Closes #11509

Signed-off-by: Matei David <[email protected]>
@mateiidavid mateiidavid requested a review from a team as a code owner November 17, 2023 13:12
Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, modulo lint issue 👍

pkg/healthcheck/healthcheck.go Outdated Show resolved Hide resolved
pkg/healthcheck/healthcheck.go Outdated Show resolved Hide resolved
Signed-off-by: Matei David <[email protected]>
@mateiidavid mateiidavid merged commit 2cc13d7 into main Nov 20, 2023
32 checks passed
@mateiidavid mateiidavid deleted the matei/dedupe-labels-check branch November 20, 2023 16:27
@hawkw hawkw mentioned this pull request Nov 22, 2023
hawkw added a commit that referenced this pull request Nov 22, 2023
## edge-23.11.4

This edge release introduces support for the native sidecar containers
entering beta support in Kubernetes 1.29. This improves the startup and
shutdown ordering for the proxy relative to other containers, fixing the
long-standing shutdown issue with injected `Job`s. Furthermore, traffic
from other `initContainer`s can now be proxied by Linkerd.

In addition, this edge release includes Helm chart improvements, and
improvements to the multicluster extension.

* Added a new `config.alpha.linkerd.io/proxy-enable-native-sidecar`
  annotation and `Proxy.NativeSidecar` Helm option that causes the proxy
  container to run as an init-container (thanks @teejaded!) (#11465;
  fixes #11461)
* Fixed broken affinity rules for the multicluster `service-mirror` when
  running in HA mode (#11609; fixes #11603)
* Added a new check to `linkerd check` that ensures all extension
  namespaces are configured properly (#11629; fixes #11509)
* Updated the Prometheus Docker image used by the `linkerd-viz`
  extension to v2.48.0, resolving a number of CVEs in older Prometheus
  versions (#11633)
* Added `nodeAffinity` to `deployment` templates in the `linkerd-viz`
  and `linkerd-jaeger` Helm charts (thanks @naing2victor!) (#11464;
  fixes #10680)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

linkerd fails to find Lease component for service-mirror
3 participants