Destination controller goes into failfast mode #11279

bjoernw · 2023-08-21T22:34:19Z

What is the issue?

We use canaries with ArgoRollout via SMI meaning there is a lot of endpointslice churn due to frequent release trains and their 10-15min canary rollouts. (~300-400 meshed pods)

Occasionally we will see the destination service go into failfast mode, which is problematic during a canary deployment because old IPs of pods that were already torn down by ArgoRollout are still receiving traffic even though they should not. To remedy this we need to restart the destination deployment.

How can it be reproduced?

This happens intermittently under load. We run 5 instances of each control plane component.

Logs, error output, etc

The following are all logs from destination pods:

[ 18093.017333s]  WARN ThreadId(01) linkerd_reconnect: Service failed error=channel closed

[951475.788267s]  WARN ThreadId(01) linkerd_reconnect: Failed to connect error=Connection refused (os error 111)

[950833.887787s]  WARN ThreadId(01) inbound:server{port=8086}:controller{addr=localhost:8086}: linkerd_stack::failfast: Service entering failfast after 1s

[950842.896953s]  WARN ThreadId(01) watch{port=8086}:controller{addr=localhost:8090}: linkerd_stack::failfast: Service entering failfast after 10s

[ 18093.017325s]  WARN ThreadId(01) inbound:server{port=8090}:rescue{client.addr=10.30.243.33:34358}: linkerd_app_inbound::http::server: Unexpected error error=client 10.30.243.33:34358: server: 10.30.205.39:8090: server 10.30.205.39:8090: service linkerd-policy.linkerd.svc.cluster.local:8090: operation was canceled: connection closed error.sources=[server 10.30.205.39:8090: service linkerd-policy.linkerd.svc.cluster.local:8090: operation was canceled: connection closed, operation was canceled: connection closed, connection closed]

[     3.751136s]  WARN ThreadId(01) watch{port=8086}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]

[ 14118.108714s]  WARN ThreadId(01) inbound: linkerd_app_core::serve: Server failed to accept connection error=failed to obtain peer address: Transport endpoint is not connected (os error 107) error.sources=[Transport endpoint is not connected (os error 107)]

[ 18093.017344s]  INFO ThreadId(01) inbound:server{port=8090}:rescue{client.addr=10.30.156.57:57528}: linkerd_app_core::errors::respond: gRPC request failed error=client 10.30.156.57:57528: server: 10.30.205.39:8090: server 10.30.205.39:8090: service linkerd-policy.linkerd.svc.cluster.local:8090: operation was canceled: connection closed error.sources=[server 10.30.205.39:8090: service linkerd-policy.linkerd.svc.cluster.local:8090: operation was canceled: connection closed, operation was canceled: connection closed, connection closed]

output of `linkerd check -o short`

linkerd-identity
----------------
‼ issuer cert is valid for at least 60 days
    issuer certificate will expire on --------------
    see https://linkerd.io/2.13/checks/#l5d-identity-issuer-cert-not-expiring-soon for hints

linkerd-version
---------------
‼ cli is up-to-date
    is running version 2.13.4 but the latest stable version is 2.13.6
    see https://linkerd.io/2.13/checks/#l5d-version-cli for hints

control-plane-version
---------------------
‼ control plane and cli versions match
    control plane running stable-2.13.6 but cli running stable-2.13.4
    see https://linkerd.io/2.13/checks/#l5d-version-control for hints

linkerd-control-plane-proxy
---------------------------
‼ control plane proxies and cli versions match
    linkerd-destination-789df68585-288sg running stable-2.13.6 but cli running stable-2.13.4
    see https://linkerd.io/2.13/checks/#l5d-cp-proxy-cli-version for hints

linkerd-viz
-----------
‼ viz extension proxies and cli versions match
    linkerd-destination-789df68585-288sg running stable-2.13.6 but cli running stable-2.13.4
    see https://linkerd.io/2.13/checks/#l5d-viz-proxy-cli-version for hints

Status check results are √

Environment

v1.24.16-eks
linkerd: 2.13.6

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

None

The text was updated successfully, but these errors were encountered:

bjoernw · 2023-09-05T23:10:48Z

Just hit this issue again with Connection refused (os error 111) on the destination controller that caused a 2min outage that ultimately resolved itself because i have a diagnostic script that monitors the destination controller in particular. Any update on the triaging of this issue @adleong ?

bjoernw · 2023-09-05T23:20:48Z

Another data point:
I have a diagnostics script that runs linkerd diagnostics against every destination pod every minute and diffs the IPs it knows about with the IPs that the k8s endpointslices controller knows about for 5 different services. Looks like the diagnostics command timed out in this instance (which causes the script to restart the offending destination pod).

Maybe some race condition that locks up the entire destination controller process? Only 5 of the 10 pods had this issue and were rebooted by the script

adleong · 2023-09-11T18:10:58Z

Hi @bjoernw, sorry for the delay in responding to this. There are a couple of things that would be super helpful for digging deeper into what's going on here.

I see that you've included some of the proxy logs from the destination controller pod. It would also be great to see logs from the destination container in that pod and the policy container in that pod as well. Wherever possible, getting full logs rather than snippets will help establish context. It would also be great to see the controller metrics, which can be retrieved by running linkerd diagnostics controller-metrics.

omidraha · 2023-09-15T23:15:09Z

I think I have similar issue.
And also I added the output of this linkerd diagnostics controller-metrics command.

adleong · 2023-09-25T19:20:22Z

Thanks for all your patience with this issue, @bjoernw. I've looked at the logs that you sent me, and one thing that I noticed, is that the failfast errors in the destination controller only seem to start happening after that pod has received a sigterm signal.

I believe that what is likely happening is that when the pod is sent a sigterm, the destination controller and policy controller containers shut down before the proxy does. When the proxy gets inbound requests for these controllers which are no longer available, it returns a failfast error.

Pod shutdown ordering is a complex topic and there has been some debate about how to change the way we handle graceful shutdown here: #10379

With all of that said, this should not affect your application traffic. If a data-plane proxy gets a 503 error from the control plane (because that control plane pod is in the process of shutting down), it should retry it and get load balanced to a different destination controller pod (which is hopefully not in the process of shutting down).

I want revisit some assumptions from the original description here. You mention that there is a endpoint staleness condition where traffic gets routed to pods which no longer exist. You also mention failfast errors in the destination controller proxy logs and restarting the destination controller. Are you sure that you have the causality in the right direction? Is it possible that when you encounter endpoint staleness, you restart the destination controller, and during the course of shutting down, the destination controller proxy prints a bunch of failfast error logs?

bjoernw · 2023-09-26T17:22:06Z

I had to write this diagnostic script because this issue keeps happening so it's not the cause. I actually don't see my script actively terminating these pods. It must be the health check timing out that's causing k8s to terminate the pod.

olix0r · 2023-09-28T20:54:52Z

@bjoernw I think your theory about health checks failing is worth looking into in more detail. Or more generally determining why the kubelet is choosing to kill these pods...

Does kubectl describe indicate that health checks have failed? For which ports? Are there other events that might indicate why Kubernetes is terminating these processes?

thjwhite · 2023-10-09T17:41:12Z

Wanted to chime in with my team's experience. Not completely sure if this is the same issue or not, but it look very similar. We occasionally observe application pods going into a loop where we see a large number of 5xx errors. Our cluster runs on EKS, linkerd 2.14.1 .
Log Lines from application pod's linkerd-proxy:

Starts at 2023-10-07 01:46:32

[ 1577.793693s] WARN ThreadId(01) outbound:proxy{addr=172.20.250.111:50051}:service{ns=yy name=xx port=50051}:endpoint{addr=10.43.52.17:50051}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)

[ 5.540692s] INFO ThreadId(01) outbound:proxy{addr=172.20.250.111:50051}:rescue{client.addr=10.43.26.109:57910}: linkerd_app_core::errors::respond: gRPC request failed error=logical service xx.yy.svc.cluster.local:50051: service in fail-fast error.sources=[service in fail-fast]
[ 5.540438s] WARN ThreadId(01) outbound:proxy{addr=172.20.250.111:50051}: linkerd_stack::failfast: Service entering failfast after 3s

The cluster in question has about 2000, to 3000 pods in the mesh at load. The applications all autoscale, so pods are coming in and out constantly throughout the day. Normally, everything is working great, but it occasionally get's stuck and starts reporting the above errors for every request. Restarting the destination linkerd pods does fix the issue. Interesting also, is that restarting the individual pods having connection issues also temporarily fixes the issue, although for much shorter period of time. For example, restarting linkerd-destination fixes the issue for a few days or a week. Restarting individual failing pods, fixes the issue for a few minutes or hours.

Logs from the destination controller. The logs timestamps are not really correlated and come after the pod application errors from above
Starts at 2023-10-07 05:55:47

[ 4.771258s] WARN ThreadId(01) watch{port=4191}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[ 4.269445s] WARN ThreadId(01) watch{port=4191}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]

There is one interesting log that does come at around the same time as the errors:
Timestamp 2023-10-07 01:27:04

time="2023-10-07T08:27:04Z" level=error msg="failed to send protocol update: rpc error: code = Canceled desc = context canceled" addr=":8086" component=server ns=zz pod=cc

olix0r · 2023-10-10T19:33:51Z

@thjwhite

Wanted to chime in with my team's experience. Not completely sure if this is the same issue or not, but it look very similar. We occasionally observe application pods going into a loop where we see a large number of 5xx errors. Our cluster runs on EKS, linkerd 2.14.1 .

Thank you for the additional data!

Log Lines from application pod's linkerd-proxy:

Starts at 2023-10-07 01:46:32

[ 1577.793693s] WARN ThreadId(01) outbound:proxy{addr=172.20.250.111:50051}:service{ns=yy name=xx port=50051}:endpoint{addr=10.43.52.17:50051}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)

[ 5.540692s] INFO ThreadId(01) outbound:proxy{addr=172.20.250.111:50051}:rescue{client.addr=10.43.26.109:57910}: linkerd_app_core::errors::respond: gRPC request failed error=logical service xx.yy.svc.cluster.local:50051: service in fail-fast error.sources=[service in fail-fast]
[ 5.540438s] WARN ThreadId(01) outbound:proxy{addr=172.20.250.111:50051}: linkerd_stack::failfast: Service entering failfast after 3s

Are these logs from multiple pods or a single pod? Specifically, the first quoted log occurred ~26mins after the proxy started and the second quoted log snippet occurred ~5s after the proxy started.

In any case, seeing a failfast timeout 5s after pod startup strongly indicates that the control plane did not return any actively available endpoints to the data plane.

The cluster in question has about 2000, to 3000 pods in the mesh at load. The applications all autoscale, so pods are coming in and out constantly throughout the day. Normally, everything is working great, but it occasionally get's stuck and starts reporting the above errors for every request. Restarting the destination linkerd pods does fix the issue. Interesting also, is that restarting the individual pods having connection issues also temporarily fixes the issue, although for much shorter period of time. For example, restarting linkerd-destination fixes the issue for a few days or a week.

This is helpful context.

Restarting individual failing pods, fixes the issue for a few minutes or hours.

This is surprising given that the above log indicated a proxy in this state only 5s after startup. That is, the initial lookup the proxy performed would appear not to include any available endpoints.

It sounds like you're running multiple destination controller replicas? Perhaps one replica is getting into a bad state while the others remain healthy; so as you restart data plane workloads, resolutions randomly hit controller pods and the problem doesn't manifest.

Logs from the destination controller. The logs timestamps are not really correlated and come after the pod application errors from above Starts at 2023-10-07 05:55:47

[ 4.771258s] WARN ThreadId(01) watch{port=4191}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[ 4.269445s] WARN ThreadId(01) watch{port=4191}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]

These logs are benign and occur during startup (we see this happened 4s after the proxy started), as the proxy attempts to connect to the local policy controller, which may take a few seconds to begin serving its gRPC server. We probably ought to find a way to avoid logging warnings during startup...

There is one interesting log that does come at around the same time as the errors: Timestamp 2023-10-07 01:27:04

time="2023-10-07T08:27:04Z" level=error msg="failed to send protocol update: rpc error: code = Canceled desc = context canceled" addr=":8086" component=server ns=zz pod=cc

This typically indicates that a client canceled a resolution (i.e. due to a restart, etc). These are fairly typical and probably shouldn't be logged at error level...

The information you've provided lines up closely with similar reports we have received. We have been actively trying to reproduce this problem with a variety of testing strategies and have so far been unsuccessful.

Also, we are going to put together a plan to get higher fidelity diagnostics about the destination controller's state.

We'll provide updates on this issue as this work progresses.

thjwhite · 2023-10-11T05:36:18Z

Glad to help. if there is any additional diagnostics we could run on our cluster, I'd be happy to provide that.

To answer your questions:

Are these logs from multiple pods or a single pod? Specifically, the first quoted log occurred ~26mins after the proxy started and the second quoted log snippet occurred ~5s after the proxy started.
In any case, seeing a failfast timeout 5s after pod startup strongly indicates that the control plane did not return any actively available endpoints to the data plane.

I should have included more info on the pods. Yes, they are from separate pods. I had pulled the logs from our central logging system filtering on our application and the linkerd-proxy container so I think they were interleaved. One thing we had brainstormed when talking about the issue internally, is it might be an issue with the K8s/EKS control plane itself? Like if the k8s control plane had an increased error rate, latency, or connection errors, maybe the destination controller also fails in an unexpected way? We have a ticket open with our AWS contacts to see if we can get any more info on what happened on that side of things during the outage. We have had issues in the past where an unexpected high load on EKS/k8s control plane causes API errors at the k8s level, which can effect things like operators, CI/CD, and anything using the k8s apis.

It sounds like you're running multiple destination controller replicas? Perhaps one replica is getting into a bad state while the others remain healthy; so as you restart data plane workloads, resolutions randomly hit controller pods and the problem doesn't manifest.

Yes we run 3 replicas of destination controller. One other thing to note, is that the linkerd controller pods all were "healthy" and in ready state at the time of the outage. If it is getting into a bad state, health checks were not flagging it in our cluster.

This typically indicates that a client canceled a resolution (i.e. due to a restart, etc). These are fairly typical and probably shouldn't be logged at error level...

Good to know that! I figured I'd include it since it stood out due to the timing of it.

olix0r · 2023-10-11T23:40:28Z

@thjwhite

The applications all autoscale, so pods are coming in and out constantly throughout the day.

Are you scaling nodes in this situation, too? or is this in a fixed node pool?

We've kicked off a few scale/burn-in tests to exercise aggressive service churn and I'm curious if node churn is a part of the equation for you as well.

thjwhite · 2023-10-11T23:45:45Z

Yes, we are scaling up/down nodes as well.

olix0r · 2023-10-12T20:41:14Z

We have a theory that we're chasing down and we've identified a prometheus query that can help highlight this problem:

sum (endpoints_updates{component='destination'}) by(instance, service, port)

If you don't have a prometheus setup, you can approximate this by faking a timeseries by snapshotting the control plane's state a few times:

date && linkerd diagnostics controller-metrics && sleep 60 && date && linkerd diagnostics controller-metrics

We expect that all service updates should flat-line for a single destination controller instance. Something like:

We also expect that other destination controller instances should not experience any directly correlated failures of this nature.

We're still working to confirm our hypothesis and to ultimately reproduce this issue, but in the meantime capturing data from incidents of this failure will most effectively help to confirm or disprove our theory.

## edge-23.10.3 This edge release fixes issues in the proxy and destination controller which can result in Linkerd proxies sending traffic to stale endpoints. In addition, it contains other bugfixes and updates dependencies to include patches for the security advisories [CVE-2023-44487]/GHSA-qppj-fm5r-hxr3 and GHSA-c827-hfw6-qwvm. * Fixed an issue where the Destination controller could stop processing changes in the endpoints of a destination, if a proxy subscribed to that destination stops reading service discovery updates. This issue results in proxies attempting to send traffic for that destination to stale endpoints ([#11483], fixes [#11480], [#11279], and [#10590]) * Fixed a regression introduced in stable-2.13.0 where proxies would not terminate unused service discovery watches, exerting backpressure on the Destination controller which could cause it to become stuck ([linkerd2-proxy#2484] and [linkerd2-proxy#2486]) * Added `INFO`-level logging to the proxy when endpoints are added or removed from a load balancer. These logs are enabled by default, and can be disabled by [setting the proxy log level][proxy-log-level] to `warn,linkerd=info,linkerd_proxy_balance=warn` or similar ([linkerd2-proxy#2486]) * Fixed a regression where the proxy rendered `grpc_status` metric labels as a string rather than as the numeric status code ([linkerd2-proxy#2480]; fixes [#11449]) * Added missing `imagePullSecrets` to `linkerd-jaeger` ServiceAccount ([#11504]) * Updated the control plane's dependency on the `golang.google.org/grpc` Go package to include patches for [CVE-2023-44487]/GHSA-qppj-fm5r-hxr3 ([#11496]) * Updated dependencies on `rustix` to include patches for GHSA-c827-hfw6-qwvm ([linkerd2-proxy#2488] and [#11512]). [#10590]: #10590 [#11279]: #11279 [#11483]: #11483 [#11449]: #11449 [#11480]: #11480 [#11504]: #11504 [#11504]: #11512 [linkerd2-proxy#2480]: linkerd/linkerd2-proxy#2480 [linkerd2-proxy#2484]: linkerd/linkerd2-proxy#2484 [linkerd2-proxy#2486]: linkerd/linkerd2-proxy#2486 [linkerd2-proxy#2488]: linkerd/linkerd2-proxy#2488 [proxy-log-level]: https://linkerd.io/2.14/tasks/modifying-proxy-log-level/ [CVE-2023-44487]: GHSA-qppj-fm5r-hxr3

hawkw · 2023-10-20T00:07:12Z

The latest edge release, edge-23.10.3, contains changes to the proxy and Destination controller which should resolve these issues. We'd love your help validating this fix by trying out edge-23.10.3 in your environments and letting us know if you encounter any more use of stale endpoints. Thanks!

This stable release fixes issues in the proxy and Destination controller which can result in Linkerd proxies sending traffic to stale endpoints. In addition, it contains a bug fix for profile resolutions for pods bound on host ports and includes patches for security advisory [CVE-2023-44487]/GHSA-qppj-fm5r-hxr3 * Control Plane * Fixed an issue where the Destination controller could stop processing changes in the endpoints of a destination, if a proxy subscribed to that destination stops reading service discovery updates. This issue results in proxies attempting to send traffic for that destination to stale endpoints ([#11483], fixes [#11480], [#11279], [#10590]) * Fixed an issue where the Destination controller would not update pod metadata for profile resolutions for a pod accessed via the host network (e.g. HostPort endpoints) ([#11334]) * Addressed [CVE-2023-44487]/GHSA-qppj-fm5r-hxr3 by upgrading several dependencies (including Go's gRPC and net libraries) * Proxy * Fixed a regression where the proxy rendered `grpc_status` metric labels as a string rather than as the numeric status code ([linkerd2-proxy#2480]; fixes [#11449]) * Fixed a regression introduced in stable-2.13.0 where proxies would not terminate unusred service discovery watches, exerting backpressure on the Destination controller which could cause it to become stuck ([linkerd2-proxy#2484]) [#10590]: #10590 [#11279]: #11279 [#11483]: #11483 [#11480]: #11480 [#11334]: #11334 [#11449]: #11449 [CVE-2023-44487]: GHSA-qppj-fm5r-hxr3 [linkerd2-proxy#2480]: linkerd/linkerd2-proxy#2480 [linkerd2-proxy#2484]: linkerd/linkerd2-proxy#2484 Signed-off-by: Matei David <[email protected]>

This stable release fixes issues in the proxy and Destination controller which can result in Linkerd proxies sending traffic to stale endpoints. In addition, it contains a bug fix for profile resolutions for pods bound on host ports and includes patches for security advisory [CVE-2023-44487]/GHSA-qppj-fm5r-hxr3 * Control Plane * Fixed an issue where the Destination controller could stop processing changes in the endpoints of a destination, if a proxy subscribed to that destination stops reading service discovery updates. This issue results in proxies attempting to send traffic for that destination to stale endpoints ([#11483], fixes [#11480], [#11279], [#10590]) * Fixed an issue where the Destination controller would not update pod metadata for profile resolutions for a pod accessed via the host network (e.g. HostPort endpoints) ([#11334]) * Addressed [CVE-2023-44487]/GHSA-qppj-fm5r-hxr3 by upgrading several dependencies (including Go's gRPC and net libraries) * Proxy * Fixed a regression where the proxy rendered `grpc_status` metric labels as a string rather than as the numeric status code ([linkerd2-proxy#2480]; fixes [#11449]) * Fixed a regression introduced in stable-2.13.0 where proxies would not terminate unusred service discovery watches, exerting backpressure on the Destination controller which could cause it to become stuck ([linkerd2-proxy#2484]) [#10590]: #10590 [#11279]: #11279 [#11483]: #11483 [#11480]: #11480 [#11334]: #11334 [#11449]: #11449 [CVE-2023-44487]: GHSA-qppj-fm5r-hxr3 [linkerd2-proxy#2480]: linkerd/linkerd2-proxy#2480 [linkerd2-proxy#2484]: linkerd/linkerd2-proxy#2484 --------- Signed-off-by: Matei David <[email protected]> Co-authored-by: Alejandro Pedraza <[email protected]> Co-authored-by: Oliver Gould <[email protected]>

DavidMcLaughlin · 2023-10-30T16:29:42Z

We are confident that destination controller staleness issues due to a deadlock condition are now available in stable-2.14.2. Please feel free to reopen if you encounter this issue after upgrading.

hadican · 2023-10-31T07:42:45Z

@DavidMcLaughlin Could you please apply it to 2.13 if applicable?

bjoernw · 2023-11-02T16:27:40Z

Thanks, Team!

bjoernw added the bug label Aug 21, 2023

adleong added the priority/triage label Aug 24, 2023

risingspiral mentioned this issue Sep 6, 2023

Outbound HTTP call failing intermittently #11055

Closed

adleong self-assigned this Sep 11, 2023

olix0r added the bug/staleness label Oct 11, 2023

hawkw mentioned this issue Oct 19, 2023

edge-23.10.3 #11513

Merged

adleong removed their assignment Oct 26, 2023

mateiidavid mentioned this issue Oct 26, 2023

stable-2.14.2 #11539

Merged

DavidMcLaughlin closed this as completed Oct 30, 2023

github-actions bot locked as resolved and limited conversation to collaborators Dec 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Destination controller goes into failfast mode #11279

Destination controller goes into failfast mode #11279

bjoernw commented Aug 21, 2023

bjoernw commented Sep 5, 2023 •

edited

Loading

bjoernw commented Sep 5, 2023 •

edited

Loading

adleong commented Sep 11, 2023

omidraha commented Sep 15, 2023

adleong commented Sep 25, 2023

bjoernw commented Sep 26, 2023

olix0r commented Sep 28, 2023

thjwhite commented Oct 9, 2023

olix0r commented Oct 10, 2023 •

edited

Loading

thjwhite commented Oct 11, 2023

olix0r commented Oct 11, 2023

thjwhite commented Oct 11, 2023

olix0r commented Oct 12, 2023

hawkw commented Oct 20, 2023

DavidMcLaughlin commented Oct 30, 2023

hadican commented Oct 31, 2023

bjoernw commented Nov 2, 2023

Destination controller goes into failfast mode #11279

Destination controller goes into failfast mode #11279

Comments

bjoernw commented Aug 21, 2023

What is the issue?

How can it be reproduced?

Logs, error output, etc

output of linkerd check -o short

Environment

Possible solution

Additional context

Would you like to work on fixing this bug?

bjoernw commented Sep 5, 2023 • edited Loading

bjoernw commented Sep 5, 2023 • edited Loading

adleong commented Sep 11, 2023

omidraha commented Sep 15, 2023

adleong commented Sep 25, 2023

bjoernw commented Sep 26, 2023

olix0r commented Sep 28, 2023

thjwhite commented Oct 9, 2023

olix0r commented Oct 10, 2023 • edited Loading

thjwhite commented Oct 11, 2023

olix0r commented Oct 11, 2023

thjwhite commented Oct 11, 2023

olix0r commented Oct 12, 2023

hawkw commented Oct 20, 2023

DavidMcLaughlin commented Oct 30, 2023

hadican commented Oct 31, 2023

bjoernw commented Nov 2, 2023

output of `linkerd check -o short`

bjoernw commented Sep 5, 2023 •

edited

Loading

bjoernw commented Sep 5, 2023 •

edited

Loading

olix0r commented Oct 10, 2023 •

edited

Loading