Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

Kubelet TLS bootstrapping #449

Merged
merged 5 commits into from
Apr 2, 2017
Merged

Kubelet TLS bootstrapping #449

merged 5 commits into from
Apr 2, 2017

Conversation

danielfm
Copy link
Contributor

@danielfm danielfm commented Mar 23, 2017

This PR introduces an experimental feature for provisioning TLS certificates for worker nodes via the alpha certificate signing requests workflow.

If RBAC is enabled in cluster.yaml, I also made sure to set up the appropriate cluster role bindings so that the token used for sending the certificate signing requests can only make requests related to certificate provisioning, as instructed by the official documentation.

Checklist:

  • Add a kube-aws render token-file that creates a token auth file with a cluster TLS bootstrap (if the feature is turned on in cluster.yaml, or an empty file otherwise) (undone in Improve auth tokens / TLS bootstrapping UX #489)
  • Check whether the auth token file has a proper bootstrap token when the TLS bootstrap feature is turned on; if not, return a self-explaining message so the user knows what to do in order to fix the problem (partially changed in Improve auth tokens / TLS bootstrapping UX #489)
  • Read the bootstrap token from the auth token file and generate an encrypted version of it
  • Send the encrypted bootstrap token in worker userdata
  • Replace the decrypted token in the kubeconfig used for the TLS bootstrapping workflow
  • Fix / add more tests wherever possible
  • Manual tests
    • Create clusters with TLS bootstrapping enabled and disabled
    • Make sure it plays well with other experimental features (i.e. self-hosted Calico, AWS node labels, node draining)
    • Try to add worker nodes (and stop/start the kubelet on an existing node) to see if they can join the cluster without the need to manually approve every new certificate signing request
    • Test token rollout methods (Edit: kube-aws update works as expected, since the generated certificates remain valid even though the token used to request it has changed)

Closes #406.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 23, 2017
@codecov-io
Copy link

codecov-io commented Mar 23, 2017

Codecov Report

Merging #449 into master will decrease coverage by 1.44%.
The diff coverage is 67.74%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #449      +/-   ##
==========================================
- Coverage    41.1%   39.65%   -1.45%     
==========================================
  Files          38       36       -2     
  Lines        2681     2335     -346     
==========================================
- Hits         1102      926     -176     
+ Misses       1418     1255     -163     
+ Partials      161      154       -7
Impacted Files Coverage Δ
core/controlplane/config/credential.go 60.22% <100%> (+0.92%) ⬆️
core/controlplane/config/tls_config.go 68.93% <100%> (+0.51%) ⬆️
core/controlplane/config/config.go 60.36% <22.22%> (-3.48%) ⬇️
core/controlplane/config/token_config.go 64.28% <70.12%> (-5.28%) ⬇️
model/derived/etcd_cluster.go 57.14% <0%> (-22.86%) ⬇️
core/controlplane/config/stack_config.go 67.64% <0%> (-3.27%) ⬇️
core/controlplane/cluster/describer.go 0% <0%> (ø) ⬆️
core/nodepool/cluster/cluster.go
model/etcd_cluster.go
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4b02d07...9a98f79. Read the comment docs.

@danielfm danielfm changed the title [WIP] Kubelet TLS bootstrapping Kubelet TLS bootstrapping Mar 28, 2017
@danielfm
Copy link
Contributor Author

danielfm commented Mar 29, 2017

Just when I thought I was almost done with this PR. 😅

Edit: See my comment below.

@danielfm
Copy link
Contributor Author

danielfm commented Mar 29, 2017

I believe this PR is ready for review! 🎉

PS: This changes quite a few things, so please take your time to review this and let me know of any changes you guys think of to get this working as smoothly as possible.

/cc @mumoshu

@mumoshu
Copy link
Contributor

mumoshu commented Mar 29, 2017

Thanks for your efforts @danielfm!

First of all, regarding your comment, should we or do we need to translate this work with the bootstrap tokens after k8s 1.6?

@@ -63,6 +63,9 @@ func NewDefaultCluster() *Cluster {
Disk: "xvdb",
Filesystem: "xfs",
},
ClusterTLSBootstrap: ClusterTLSBootstrap{
Enabled: false,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nit picky but moving this block to the line 61 would be better for a consistent ordering between this func and the definition of the Experimental struct.

@danielfm
Copy link
Contributor Author

danielfm commented Mar 29, 2017

@mumoshu This stuff just came out, so I'm still reading and learning how it compares to what I did here, but for now I think the bootstrap tokens might not suit kube-aws for a few reasons:

  • Edit (added): According to the design docs, the bootstrap tokens facility is useful when you want new instances to join the cluster but don't have means to know they're talking to a trusted API server endpoint. In kube-aws, we don't have this problem since we generate a CA for the cluster (or allow users to provide their own certificates), so the API server endpoint can already be verified by worker nodes.
  • Bootstrap tokens are new to Kubernetes 1.6, while the implementation in this PR can be used for 1.4.x+; the only requirements are RBAC and CSR support.
  • Bootstrap tokens have an expiration date, after which the token will be no longer valid. This is nice for security reasons, but it might cause problems for less careful operators in more long-lived clusters. In this PR, the token is valid for as long as it's listed in ./credentials/tokens.csv, and the operator can rotate the token at any time by changing this file, without risking problems due to expired tokens when trying to scale the cluster up (or adding new node pools) at a later moment.
    • Edit (added): according to the design docs, it's possible to create tokens without a TTL
  • Bootstrap tokens' usage scope are defined by usage-bootstrap-* attributes (like authentication and signing), which for me doesn't make much sense since we already can use RBAC to restrict (with greater flexibility) which actions can be performed which resources.
    • Edit (added): Sorry, it looks like I misunderstood this part. The signing thing is to allow a bootstrap token to be used to sign the cluster-info configmap in kube-public namespace so that the client can connect to the API server and, after verifying the configmap signature, it knows the API server can be trusted. Again, this is not really something we need to do in kube-aws since we already provide a facility for shipping the CA in all worker nodes.
  • If I understood correctly, the format used for bootstrap tokens doesn't follow the security guidelines outlined here, which states that a token, to be considered secure, should represent at least 128 bits of entropy. The implementation in this PR, on the other hand, generates tokens that satisfy this requirement.
    • Edit (added): I was wrong here, the bootstrap token format does satisfy the requirements. We are using a much stronger token in that regard though.

Edit (added): After taking some time to digest the new information, my conclusion is that, for the reasons stated above, this PR continues to be valid as it proposes to solve a different problem than the new bootstrap tokens; what this PR proposes to solve is the issuing of TLS client certificates to worker nodes via Certificate Signing Requests (CSR), while bootstrap tokens is a protocol for obtaining a CA from a verified API server endpoint without resorting to plain HTTP connections.

That's my first opinion on this, but I'm eager to hear what you guys think.

return nil, err
}
stackConfig.Config.AuthTokensConfig = rawAuthTokens
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: Probably this block is from #470

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mumoshu Yes, I found and fixed the bug here before #470 was opened. If it gets merged before this PR, I'll remove this.

@@ -37,7 +40,7 @@ type ProvidedConfig struct {
cfg.Experimental `yaml:",inline"`
Private bool `yaml:"private,omitempty"`
NodePoolName string `yaml:"name,omitempty"`
providedEncryptService cfg.EncryptService
ProvidedEncryptService cfg.EncryptService
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you leave providedEncryptService being a private member if it isn't referenced from another packages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mumoshu The ProvidedEncryptService from both .../core/controlplane/config and .../core/nodepool/config are used in function ConfigFromBytesWithEncryptService of .../core/root/config.

But I will re-check all my code to see if there are any new member that could be left private 👍

@danielfm
Copy link
Contributor Author

@mumoshu et al: I updated my previous comment regarding the new bootstrap tokens feature in Kubernetes 1.6, let me know what you think.


fmt.Printf("Validation OK!\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch 👍

{{- if .Experimental.ClusterTLSBootstrap.Enabled }}
--volume=kube,kind=host,source=/etc/kubernetes,readOnly=false \
--mount=volume=kube,target=/etc/kubernetes \
{{- end }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the cluster TLS bootstrap feature is enabled, the original volume and the mount for /etc/kubernetes/ssl seems not be necessary; then, how about something like:

{{- if .Experimental.ClusterTLSBootstrap.Enabled }}
--volume=kube,kind=host,source=/etc/kubernetes,readOnly=false \
--mount=volume=kube,target=/etc/kubernetes \
{{- else}}
--volume=ssl,kind=host,source=/etc/kubernetes/ssl,readOnly=false \
--mount=volume=ssl,target=/etc/kubernetes/ssl \
{{- end }}

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, great suggestion. 👍

-e LAUNCHCONFIGURATION=${LAUNCHCONFIGURATION} \
{{.HyperkubeImage.RepoWithTag}} /bin/bash \
-ec 'echo "placing labels and annotations with additional AWS parameters."; \
kctl="/kubectl --server=https://{{.ExternalDNSName}}:443 --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml"; \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi: ExternalDNSName should be APIEndpoint.DNSName once #468 is merged

- create
- nonResourceURLs: ["*"]
verbs: ["*"]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the role and the binding for bootstrapped-node necessary for this feature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, yes.

The certificates generated via the certificate signing workflow are bound to the system:nodes group, which do not have those permissions, which are needed by the node drainer and the node labeler (the one that adds AWS metadata as nodes labels and annotations).

Without this, node draining does not work, and the AWS metadata is not added to nodes bootstrapped via this CSR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It totally make sense 👍

So, in other words, do we need these role and the binding due to the lack of permissions in the k8s default role and the binding for system:nodes?
if that's the case, I guess it would be even nicer to add some comments including a link to the doc or code describing/defining the default role & binding, if any.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in other words, do we need these role and the binding due to the lack of permissions in the k8s default role and the binding for system:nodes?

Yes!

if that's the case, I guess it would be even nicer to add some comments including a link to the doc or code describing/defining the default role & binding, if any.

Added!

// Uses the same encrypt service for node pools
for _, p := range c.NodePools {
p.ProvidedEncryptService = encryptService
}
Copy link
Contributor

@mumoshu mumoshu Mar 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, but this is required to pass one of newly added tests, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, without this, the TestMainClusterConfig/WithExperimentalFeatures test fails to complete the assertConfig, probably due to the code that's used to encrypt the bootstrap token for inclusion in cloud-config-worker.

I'll take a more careful look in this part to see if I missed something here.

@@ -1029,6 +1039,8 @@ worker:
enabled: true
clusterAutoscalerSupport:
enabled: true
clusterTLSBootstrap:
enabled: true # Must be ignored, value is synced with the one from control plane
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

--tls-cert-file=/etc/kubernetes/ssl/worker.pem \
--tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem
--tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem \
Copy link
Contributor

@mumoshu mumoshu Mar 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the experimental-bootstrap-kubeconfig setting result in kubelet to write issued worker key and cert to /var/run/kubernetes, which is the well-known location for the cert and the key according to https://kubernetes.io/docs/admin/kubelet/

If that's true, I wonder if we could improve code to assume all the worker key and the cert are located under the well-known location, regardless of we enable the tls bootstrap feature or not. Doing so would require /opt/bin/decrypt-tls-assets to write decrypted key and cert to /var/run/kubernetes/<the well-known name for key or cert, if any> but result in various if-else statements be simplified; For example, this code block can be just:

{{- if .Experimental.ClusterTLSBootstrap.Enabled }}
--experimental-bootstrap-kubeconfig=/etc/kubernetes/worker-bootstrap-kubeconfig.yaml \
{{- end}}

i.e. explicit tls-cert-file and tls-private-file can be omitted when the tls bootstrap is disabled?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is there any flag to alter where the kubelet writes issued keys and certs to /etc/kubernetes/ssl rather than /var/run/kubernetes? so that we don't need if-elses to vary mounts accordingly to whether the TLS bootstrap is enabled or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably --cert-dir, but I need to test this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it worked! 🎉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! 👍 🎉

@@ -642,6 +664,9 @@ write_files:
docker run --rm --net=host \
-v /etc/kubernetes:/etc/kubernetes \
-v /etc/resolv.conf:/etc/resolv.conf \
{{- if .Experimental.ClusterTLSBootstrap.Enabled }}
-v /var/run/kubernetes:/var/run/kubernetes \
{{- end }}
Copy link
Contributor

@mumoshu mumoshu Mar 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go that way, this block could be just:

-v /var/run/kubernetes:/var/run/kubernetes

without a if.

@@ -565,9 +580,16 @@ write_files:
| base64 -d > $f
mv -f $f ${encKey%.enc}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go that way, the decrypted file must be moved to /var/run/kubernetes/<a well-known name if any> rather than ${encKey%.enc}

{{- if .Experimental.ClusterTLSBootstrap.Enabled }}
--volume=kuberun,kind=host,source=/var/run/kubernetes,readOnly=true \
--mount=volume=kuberun,target=/var/run/kubernetes \
{{- end }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go that way, this block could be just:

--volume=kuberun,kind=host,source=/var/run/kubernetes,readOnly=true \
--mount=volume=kuberun,target=/var/run/kubernetes \

without a if.

@mumoshu
Copy link
Contributor

mumoshu commented Mar 30, 2017

LGTM 👍
Could you squash and rebase commits?

@danielfm
Copy link
Contributor Author

Just made a mess trying to squash the whole thing. 😅

Let me spend a little more time testing things to ensure everything is still working so you can merge.

@danielfm
Copy link
Contributor Author

@mumoshu done! 🎉

@@ -893,6 +898,8 @@ experimental:
enabled: true
clusterAutoscalerSupport:
enabled: true
clusterTLSBootstrap:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sry but the last nit; Would you mind informing me if there's any specific reason you've prefixed the key with "cluster"? Can we call it just tlsBootstrap for consistency with the k8s doc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mumoshu I actually never gave much thought to the configuration name. I guess tlsBootstrap is better. 👍

@mumoshu mumoshu added this to the v0.9.6-rc.1 milestone Mar 31, 2017
Daniel Fernandes Martins added 2 commits March 31, 2017 10:24
However, this is probably something we'll have to do for Kubernetes
1.6.
This flag is not needed since we already hook the kube-admin user to
the cluster-admin role, which grants super-user access to the cluster.

Another reason for removing this flag is that this flag is not present
in the Kubernetes 1.6 API Server.
@danielfm
Copy link
Contributor Author

@mumoshu fixed, and changed a few other minor things as well.

I also took the opportunity to test this PR with Calico enabled, which I forgot to do in my previous tests. I found a bug there, which I'm going to fix ASAP.

This was causing problems, so I reverted the flag removal commit.
@danielfm
Copy link
Contributor Author

Okay, now apparently everything's working. 😄

@mumoshu
Copy link
Contributor

mumoshu commented Apr 2, 2017

LGTM. Thanks a lot for the great work @danielfm 👍

@mumoshu mumoshu merged commit 2ae4120 into kubernetes-retired:master Apr 2, 2017
camilb added a commit to camilb/kube-aws that referenced this pull request Apr 5, 2017
* kubernetes-incubator/master: (47 commits)
  Update README.md
  Automatic recovery from permanent failures of etcd3 nodes (kubernetes-retired#417)
  New settings: nodeMonitorGracePeriod, disableSecurityGroupIngress for controller-manager, nodeStatusUpdateFrequency for worker kubelet (kubernetes-retired#473)
  Correct file references
  Deprecate verbose legacy keys in favor of corresponding nested keys (kubernetes-retired#481)
  Update kube-system using kubectl (kubernetes-retired#472)
  Support for multiple k8s API endpoints (kubernetes-retired#468)
  Introduce the rescheduler (kubernetes-retired#441)
  Kubelet TLS bootstrapping (kubernetes-retired#449)
  Fix mount directory for containerized-build-release-binaries script
  Setup net.netfilter.nf_conntrack_max and fix error "nf_conntrack: table full, dropping packet"
  Fix broken links
  Move auth token file assest outside of if statement
  Stop uploading redundant stack.json to S3 Fixes kubernetes-retired#334
  Update kubernetes-on-aws-node-pool.md
  Fix "flannel-docker-opts.service" in cn-north-1
  Update "pause-amd64.service"
  Update CONTRIBUTING.md
  AWS CLI region default
  Fix docs
  ...
@danielfm danielfm deleted the cluster-tls-bootstrap branch April 6, 2017 11:05
camilb added a commit to camilb/kube-aws that referenced this pull request Apr 21, 2017
…-improvements

* kubernetes-incubator/master: (26 commits)
  Update README.md
  Automatic recovery from permanent failures of etcd3 nodes (kubernetes-retired#417)
  New settings: nodeMonitorGracePeriod, disableSecurityGroupIngress for controller-manager, nodeStatusUpdateFrequency for worker kubelet (kubernetes-retired#473)
  Correct file references
  Deprecate verbose legacy keys in favor of corresponding nested keys (kubernetes-retired#481)
  Update kube-system using kubectl (kubernetes-retired#472)
  Support for multiple k8s API endpoints (kubernetes-retired#468)
  Introduce the rescheduler (kubernetes-retired#441)
  Kubelet TLS bootstrapping (kubernetes-retired#449)
  Fix mount directory for containerized-build-release-binaries script
  Setup net.netfilter.nf_conntrack_max and fix error "nf_conntrack: table full, dropping packet"
  Fix broken links
  Move auth token file assest outside of if statement
  Stop uploading redundant stack.json to S3 Fixes kubernetes-retired#334
  Update kubernetes-on-aws-node-pool.md
  Update CONTRIBUTING.md
  AWS CLI region default
  Fix docs
  Fix build path and package ref
  Correct e2e package ref
  ...
kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this pull request Mar 27, 2018
* Add experimental Kubelet TLS bootstrapping

This PR introduces an experimental feature for provisioning TLS certificates for worker nodes via the alpha [certificate signing requests](https://kubernetes.io/docs/admin/kubelet-tls-bootstrapping/) workflow.

If RBAC is enabled in `cluster.yaml` via the `tlsBootstrap.enabled` key, I also made sure to set up the appropriate cluster role bindings so that the token used for sending the certificate signing requests can only make requests related to certificate provisioning, as instructed by the official documentation.

Checklist:

- [x] Add a `kube-aws render token-file` that creates a token auth file with a cluster TLS bootstrap (if the feature is turned on in `cluster.yaml`, or an empty file otherwise)
- [x] Check whether the auth token file has a proper bootstrap token when the TLS bootstrap feature is turned on; if not, return a self-explaining message so the user knows what to do in order to fix the problem
- [x] Read the bootstrap token from the auth token file and generate an encrypted version of it
- [x] Send the encrypted bootstrap token in worker userdata
- [x] Replace the decrypted token in the kubeconfig used for the TLS bootstrapping workflow
- [x] Fix / add more tests wherever possible
- [x] Manual tests
  - [x] Create clusters with TLS bootstrapping enabled and disabled
  - [x] Make sure it plays well with other experimental features (i.e. self-hosted Calico, AWS node labels, node draining)
  - [x] Try to add worker nodes (and stop/start the kubelet on an existing node) to see if they can join the cluster without the need to manually approve every new certificate signing request
  - [x] Test token rollout methods (**Edit:** `kube-aws update` works as expected, since the generated certificates remain valid even though the token used to request it has changed)

Closes kubernetes-retired#406.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TLS bootstrapping with the cluster-level CA
4 participants