-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run node authorizer in kops-controller #7780
Run node authorizer in kops-controller #7780
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: justinsb The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
4d941f2
to
7da8906
Compare
7da8906
to
6eac1ff
Compare
8e42c70
to
9476cbf
Compare
This should update |
77b7233
to
88c7f96
Compare
Thanks for the suggestion @johngmyers - added a small section to the architecture doc. As we add rules, we might want to fill in the details of how node validation is performed. Right now, there is no node validation, and therefore this is very much not enabled by default. Nonetheless, I'm marking not-WIP because I think it is the first step to enabling a lot of the other functionality (a lot of my other WIP PRs). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I am against adding "trust everything" approvers. They are, in the best case, useless. How can we be confident we have the machinery to do a reasonable validation without some code that can validate something?
func (c *nodeBootstrapClient) Close() error { | ||
if c.connection != nil { | ||
if err := c.connection.Close(); err != nil { | ||
return fmt.Errorf("error closing GRPC connection: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the style to use this instead of errors.Wrap()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - xerrors is now looking good, so maybe we should do that. I've started using it in new go projects, but I'm not sure whether it's time to put it into bigger existing projects (like kops) or whether we should wait for official support in go.
Path: filepath.Join(pkiDir, "server.key"), | ||
Contents: fi.NewBytesResource(pkiutil.EncodePrivateKeyPEM(serverKey)), | ||
Type: nodetasks.FileType_File, | ||
Mode: s("0600"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to specify the file owner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it's fixed.
Server string `json:"server"` | ||
|
||
// CACertificate is the CA certificate for the GRPC server | ||
CACertificate []byte `json:"caCertificate"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making this a list of CAs certificates, to permit rotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually looks like it already does :-)
But renaming to CACertificates to make this more self-documenting.
We should probably try to plumb this through the rest of the system though - we haven't totally ignored it, but I'm reasonably sure there are a few places where we likely drop the ball.
} | ||
|
||
func (c *nodeBootstrapClient) CreateKubeletBootstrapToken(ctx context.Context) (pb.Token, error) { | ||
request := &pb.CreateKubeletBootstrapTokenRequest{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how the server would be able to authorize anything without some sort of cloudprovider-specific credential in the request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right - so we have NodeInfo
which is empty today, but is where I think we can put this information.
The ideal is TPM, which GCE has, but AFAIK no other clouds do. But pkg/authorizers/aws/aws.go
has some good checks based on the AWS identity document, and we can also cross-check against the IP address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One cloud provider is enough; doesn't matter which. We should at least have the structure for where the provider-specific code goes.
|
||
if response.Token == nil || response.Token.BearerToken == "" { | ||
return pb.Token{}, fmt.Errorf("created bootstrap token, but response was empty") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping this would also return a TLS server certificate for the kubelet, but I guess that's another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the process here is sort of tricky - we're creating a bootstrap token and then the kubelet exchanges that for a kubelet server certificate.
I did debate just generating the kubelet server certificate; a few things made me not do it:
- It's different; so we'd have to think more carefully about our failure scenarios. Here we're just doing the normal bootstrap flow - arguably a more secure version of it because we can use shorter-lived tokens.
- I wasn't sure if kubelet certiificate rotation worked if we didn't start with a bootstrap token.
- This way kops-controller doesn't need the CA keypair. (Although we could always do a CSR flow to avoid that)
We can always generate a kubelet certificate directly in future, though - it wouldn't be a huge change!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do a CSR flow with a pre-approved request, to allow for external or hardware-key signing controllers.
return nil, err | ||
} | ||
|
||
// @step: add the secret to the namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this storing the created token in a Secret? I'm concerned a compromised node could impersonate a node in a more privileged instancegroup. Or a pod with permission to read these could escalate to impersonating a node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answering narrowly, this is how bootstrap tokens work.
Broadly, I do think your concerns are valid. We can make our bootstrap tokens short lived, we can use the NodeAuthorizer to reduce impact, but it's still a concern. The k8s secret also has all the information, whereas presumably it would be safer to store only the hash of the token (requiring an attacker to find a hash collision in a relatively short TTL).
Perhaps then we should move to just generating the kubelet TLS certificate, and trying to figure out what happens during rotation... WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Short lived doesn't help, as the compromised node could presumably set a watch for kube-system secrets and snatch it almost as soon as it's created.
But yes, this weakness appears to be designed in upstream of us. Just generating the TLS certificate might be simpler.
|
||
--- | ||
|
||
# permits the node access to create a CSR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is using these three ClusterRoleBindings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh .. good point on the RoleBinding to kops-controller - we should always have those events
permissions in kops-controller, at least.
On the ClusterRoleBindings, these are the magic ClusterRoleBindings that (1) allow a kubelet bootstrap token to create a CSR request, and (2) tell kube-controller-manager to auto-approve CSR requests and (3) tell kube-controller-manager to auto approve CSR renewal requests. Some docs are here: https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet-tls-bootstrapping/#approval
If the user/group isn't specified, don't try to change it.
kops-controller now exposes a GRPC service, which is used to more securely provide nodes their kubelet bootstrap tokens.
88c7f96
to
37a4960
Compare
NodeInfo node_info = 1; | ||
} | ||
|
||
// NodeInfo allows a node to provide any secrets etc that can help verify its identity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better name would be NodeCredential
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change - thank you!
klog.Warningf("Unhandled type %T in Service::GetDependencies: %v", v, v) | ||
deps = append(deps, v) | ||
switch fmt.Sprintf("%T", v) { | ||
case "*model.KubeletBootstrapKubeconfigTask": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not add this type to the outer switch? And why test the type by converting to string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment, but it's because of go packages not allowing circular references..
/test pull-kops-verify-staticcheck |
I can certainly work on adding more trust code. We do have the AWS trust code already from the node-authorizer - it's based on the instance identity document. GCE has vTPM which ideally we would use. But we also have some similar logic when we map the instance to a Node. In general I think we would only turn this on for platforms where we could make a reasonable case For clouds that will be based on cross-checking with the control plane and verifying some form of cloud-provided identifier or authentication. For metal I imagine it'll ideally be based on TPM, but will probably also be based on some pre-shared per-host key (maybe even the SSH hostkey, or at least something similar?) I can certainly work on implementing some of these authorizers e.g. for AWS. I'll do that in a separate PR that builds on this. Honestly, your concerns about bootstrap tokens are well made and are more concerning in my mind :-) I'm going to look at generating the kubelet certificate directly. |
I had a go at using client-certificates instead of bootstrap tokens and it does work: #8580 I also was able to validate certificate rotation by setting This does raise questions though:
|
Some people might want to reduce their cert lifetimes to be less than a year. But it is an interesting observation that the node lifetime should probably be less than the cert lifetime. Any node expiration mechanism should respect the concurrency and cluster validation limitations that rolling update uses. This is the primary motivation for #8272—to coordinate node expiration with rolling updates for other reasons by having the node expiration controller signal to the rolling update controller to remove the nodes. I would also like to issue kubelet a TLS server certificate. The rotation approver won't approve server certs because it doesn't have the cloud-provider-specific information to validate the domains. kops-controller could reasonably do so. |
@justinsb: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@justinsb can this be closed? |
@justinsb: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
I think we can call a timeout on this one :) /close |
@olemarkus: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
We create a GRPC interface and use it to serve the kubelet bootstrap
certificate; we consume it directly from nodeup.
Based heavily on the existing AWS node-authorizer implementation.