-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linkerd policy
container of destination
Pod and proxy-injector
Pod crashing
#7011
Comments
hi @BobyMCbobs. One thing to try would be to run |
Hi @adleong, thanks for your reply. Oh, that seems to be a typo. It is |
@BobyMCbobs Interesting. Unfortunately I don't have a Pi/Talos cluster to test on and I can't reproduce this error in any of my clusters. Since this seems to be some kind of problem related to the issuer certificate, I'd recommend looking at the |
It appears to not be exclusive to architecture; I tried it in a VM too and had the same results.
The logs for the
I've copied the three certs out to |
@adleong these errors probably don't have anything to do with the identity certificates. Rather, the policy controller's Kubernetes client is having trouble establishing TLS with Kubernetes API. @BobyMCbobs Thanks for the helpful report! Can you share the output of the following? :; kubectl get secret $(kubectl get sa default -o json | jq -r '.secrets[0].name') -o json | jq -r '.data["ca.crt"] | @base64d' This will dump the CA certificate for your cluster (which is totally safe to share). Also, you could try running the controller with additional logging information by using |
Here's the CA from a cluster with the issue
Here's a snippet from the end of it
Happy to upload a gist for the whole logs if need be |
I'm able to reproduce this issue with even Talos in containers and VMs (on amd64). The only thing to note about Kubernetes API cert is that Talos provisions ECDSA Kubernetes CA. $ talosctl cluster create
...
$ linkerd check --pre
Linkerd core checks
===================
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API
kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version
pre-kubernetes-setup
--------------------
√ control plane namespace does not already exist
√ can create non-namespaced resources
√ can create ServiceAccounts
√ can create Services
√ can create Deployments
√ can create CronJobs
√ can create ConfigMaps
√ can create Secrets
√ can read Secrets
√ can read extension-apiserver-authentication configmap
√ no clock skew detected
linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date
Status check results are √
$ linkerd install | kubectl apply -f -
...
$ linkerd check
Linkerd core checks
===================
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API
kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version
linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
\ pod/linkerd-destination-558894b46d-dlt26 container policy is not ready ^^ hangs forever |
(should work on OS X/Linux with Docker) |
Same problem as @BobyMCbobs reported:
|
Yeah, the Kubernetes client library we're using appears to only support RSA keys at the moment We'll take a look at fixing this. |
It looks like there are some upstream issues with parsing certain kinds of PEM-formatted EC private keys; but in general we don't seem to have any issues talking to Kubernetes API servers that use EC certificates. From a quick look at Talos's docs, it seems like all API clients may need to use mTLS to authenticate? If so, it's probably possible to get this working by building an alternate version of the policy controller that uses the The proper fix is probably to address rustls/rustls#332 |
mTLS is only used for Talos API itself (which is different from Kubernetes API). Talos ships with vanilla upstream Kubernetes, so it's completely standard distribution. ECDSA is a supported way to provision Kubernetes certificates, and all operations with ECDSA are way faster than with RSA. |
Yeah. We use ECDSA keys in Linkerd as well. If I'm reading the issue history, it seems like this is specifically a problem with parsing PEM-formatted ECDSA private keys (whereas pkcs8 formatted keys should be parseable). What I can't understand is where we would be hitting this. I know for a fact that we should be able to talk to TLS services that use ECSDA certificates (for instance, we can run this controller in k3d clusters that use these certs). We'll need to figure out how the Talos case differs from k3d. |
On second thought, I'm not convinced that the problem you're seeing has anything to do with parsing EC Private Keys -- I'd expect that error to be different. @adleong is going to try to run talos to compare it with a working k3d instance. |
Here is the server certificate for the Kubernetes API in a kind cluster which works with Linkerd:
and here is the server cert for the Kubernetes API from a Talos cluster where the policy controller is crashing:
The most obvious difference between these is that the Talos server certificate uses ECDSA as the public key algorithm and the signature algorithm. (In contrast, Kind uses RSA and SHA256-RSA respectively). Perhaps kube-rs also has issues handling ECDSA public keys? |
Nevermind, I found an even more obvious difference: The Talos certificate specifies the issuer as |
The two certs posted might be the same, I see lot of the same data |
@BobyMCbobs 🤦 you're right, I had a copy-paste fail. I've edited my comment with the actual Kind certificate. |
Here's a k3d CA certificate:
And here's the k3d API server's certificate:
|
If I'm reading this correctly, rustls doesn't actually support SHA-512 with the P-256 curve. It supports |
Here's the commit that removed support for |
Looking at RFC8446, it looks like TLSv1.3 only defines support for these algorithms:
I think the proper fix here is for Talos to issue certificates that conform with the above TLSv1.3-supported signature algorithms. |
@olix0r thanks for digging into that, that is a bit surprising as it works all over Go TLS clients/servers. But makes perfect sense. |
@smira Yeah, I agree it's surprising. I'm not 100% sure this is the problem, but so far it's the most likely issue I can see. In general, rust's TLS ecosystem tends to be fairly minimal & strict, which is generally a good thing when it comes to TLS, but it can lead to some surprising situations like this. |
See linkerd/linkerd2#7011 (comment) Looks like some implementations follow TLS 1.3 rules and skip implementing all combinations of elliptic curves and hashing. This changes makes Talos default to issuing ECDSA-P256-SHA256 certificates. Signed-off-by: Andrey Smirnov <[email protected]>
Fix is going to be merged to Talos, and we plan to release version 0.13 with the fix. |
@smira Excellent. I'm going to close this issue out for now, but please let us know if you hit any more issues going forward! |
Bug Report
What is the issue?
Linkerd core components
destination
andproxy-injector
crashing and never coming up.How can it be reproduced?
linkerd install | kubectl apply -f -
Logs, error output, etc
Some events from one of the
destination
Pods are:and logs from the
policy
container of the same PodSome events from one of the the
proxy-injector
Pods are:linkerd check
outputEnvironment
stable-2.11.01
Possible solution
Unsure
Additional context
This is running on my Raspberry Pi cluster.
I'm using
linkerd install --ha | kubectl apply -f -
but there is no difference when not using HA mode.The text was updated successfully, but these errors were encountered: