-
Notifications
You must be signed in to change notification settings - Fork 295
Kubelet TLS bootstrapping #449
Kubelet TLS bootstrapping #449
Conversation
Codecov Report
@@ Coverage Diff @@
## master #449 +/- ##
==========================================
- Coverage 41.1% 39.65% -1.45%
==========================================
Files 38 36 -2
Lines 2681 2335 -346
==========================================
- Hits 1102 926 -176
+ Misses 1418 1255 -163
+ Partials 161 154 -7
Continue to review full report at Codecov.
|
Just when I thought I was almost done with this PR. 😅 Edit: See my comment below. |
I believe this PR is ready for review! 🎉 PS: This changes quite a few things, so please take your time to review this and let me know of any changes you guys think of to get this working as smoothly as possible. /cc @mumoshu |
Thanks for your efforts @danielfm! First of all, regarding your comment, should we or do we need to translate this work with the bootstrap tokens after k8s 1.6? |
core/controlplane/config/config.go
Outdated
@@ -63,6 +63,9 @@ func NewDefaultCluster() *Cluster { | |||
Disk: "xvdb", | |||
Filesystem: "xfs", | |||
}, | |||
ClusterTLSBootstrap: ClusterTLSBootstrap{ | |||
Enabled: false, | |||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nit picky but moving this block to the line 61 would be better for a consistent ordering between this func and the definition of the Experimental struct.
@mumoshu This stuff just came out, so I'm still reading and learning how it compares to what I did here, but for now I think the bootstrap tokens might not suit kube-aws for a few reasons:
Edit (added): After taking some time to digest the new information, my conclusion is that, for the reasons stated above, this PR continues to be valid as it proposes to solve a different problem than the new bootstrap tokens; what this PR proposes to solve is the issuing of TLS client certificates to worker nodes via Certificate Signing Requests (CSR), while bootstrap tokens is a protocol for obtaining a CA from a verified API server endpoint without resorting to plain HTTP connections. That's my first opinion on this, but I'm eager to hear what you guys think. |
core/controlplane/config/config.go
Outdated
return nil, err | ||
} | ||
stackConfig.Config.AuthTokensConfig = rawAuthTokens | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: Probably this block is from #470
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -37,7 +40,7 @@ type ProvidedConfig struct { | |||
cfg.Experimental `yaml:",inline"` | |||
Private bool `yaml:"private,omitempty"` | |||
NodePoolName string `yaml:"name,omitempty"` | |||
providedEncryptService cfg.EncryptService | |||
ProvidedEncryptService cfg.EncryptService |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you leave providedEncryptService
being a private member if it isn't referenced from another packages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mumoshu The ProvidedEncryptService
from both .../core/controlplane/config
and .../core/nodepool/config
are used in function ConfigFromBytesWithEncryptService
of .../core/root/config
.
But I will re-check all my code to see if there are any new member that could be left private 👍
@mumoshu et al: I updated my previous comment regarding the new bootstrap tokens feature in Kubernetes 1.6, let me know what you think. |
|
||
fmt.Printf("Validation OK!\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch 👍
{{- if .Experimental.ClusterTLSBootstrap.Enabled }} | ||
--volume=kube,kind=host,source=/etc/kubernetes,readOnly=false \ | ||
--mount=volume=kube,target=/etc/kubernetes \ | ||
{{- end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the cluster TLS bootstrap feature is enabled, the original volume and the mount for /etc/kubernetes/ssl
seems not be necessary; then, how about something like:
{{- if .Experimental.ClusterTLSBootstrap.Enabled }}
--volume=kube,kind=host,source=/etc/kubernetes,readOnly=false \
--mount=volume=kube,target=/etc/kubernetes \
{{- else}}
--volume=ssl,kind=host,source=/etc/kubernetes/ssl,readOnly=false \
--mount=volume=ssl,target=/etc/kubernetes/ssl \
{{- end }}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, great suggestion. 👍
-e LAUNCHCONFIGURATION=${LAUNCHCONFIGURATION} \ | ||
{{.HyperkubeImage.RepoWithTag}} /bin/bash \ | ||
-ec 'echo "placing labels and annotations with additional AWS parameters."; \ | ||
kctl="/kubectl --server=https://{{.ExternalDNSName}}:443 --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml"; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi: ExternalDNSName
should be APIEndpoint.DNSName
once #468 is merged
- create | ||
- nonResourceURLs: ["*"] | ||
verbs: ["*"] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the role and the binding for bootstrapped-node
necessary for this feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, yes.
The certificates generated via the certificate signing workflow are bound to the system:nodes
group, which do not have those permissions, which are needed by the node drainer and the node labeler (the one that adds AWS metadata as nodes labels and annotations).
Without this, node draining does not work, and the AWS metadata is not added to nodes bootstrapped via this CSR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It totally make sense 👍
So, in other words, do we need these role and the binding due to the lack of permissions in the k8s default role and the binding for system:nodes
?
if that's the case, I guess it would be even nicer to add some comments including a link to the doc or code describing/defining the default role & binding, if any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, in other words, do we need these role and the binding due to the lack of permissions in the k8s default role and the binding for system:nodes?
Yes!
if that's the case, I guess it would be even nicer to add some comments including a link to the doc or code describing/defining the default role & binding, if any.
Added!
core/root/config/config.go
Outdated
// Uses the same encrypt service for node pools | ||
for _, p := range c.NodePools { | ||
p.ProvidedEncryptService = encryptService | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, but this is required to pass one of newly added tests, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, without this, the TestMainClusterConfig/WithExperimentalFeatures
test fails to complete the assertConfig
, probably due to the code that's used to encrypt the bootstrap token for inclusion in cloud-config-worker
.
I'll take a more careful look in this part to see if I missed something here.
test/integration/maincluster_test.go
Outdated
@@ -1029,6 +1039,8 @@ worker: | |||
enabled: true | |||
clusterAutoscalerSupport: | |||
enabled: true | |||
clusterTLSBootstrap: | |||
enabled: true # Must be ignored, value is synced with the one from control plane |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
--tls-cert-file=/etc/kubernetes/ssl/worker.pem \ | ||
--tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem | ||
--tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the experimental-bootstrap-kubeconfig
setting result in kubelet to write issued worker key and cert to /var/run/kubernetes
, which is the well-known location for the cert and the key according to https://kubernetes.io/docs/admin/kubelet/
If that's true, I wonder if we could improve code to assume all the worker key and the cert are located under the well-known location, regardless of we enable the tls bootstrap feature or not. Doing so would require /opt/bin/decrypt-tls-assets
to write decrypted key and cert to /var/run/kubernetes/<the well-known name for key or cert, if any>
but result in various if-else statements be simplified; For example, this code block can be just:
{{- if .Experimental.ClusterTLSBootstrap.Enabled }}
--experimental-bootstrap-kubeconfig=/etc/kubernetes/worker-bootstrap-kubeconfig.yaml \
{{- end}}
i.e. explicit tls-cert-file
and tls-private-file
can be omitted when the tls bootstrap is disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is there any flag to alter where the kubelet writes issued keys and certs to /etc/kubernetes/ssl
rather than /var/run/kubernetes
? so that we don't need if-else
s to vary mounts accordingly to whether the TLS bootstrap is enabled or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably --cert-dir
, but I need to test this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it worked! 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! 👍 🎉
@@ -642,6 +664,9 @@ write_files: | |||
docker run --rm --net=host \ | |||
-v /etc/kubernetes:/etc/kubernetes \ | |||
-v /etc/resolv.conf:/etc/resolv.conf \ | |||
{{- if .Experimental.ClusterTLSBootstrap.Enabled }} | |||
-v /var/run/kubernetes:/var/run/kubernetes \ | |||
{{- end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go that way, this block could be just:
-v /var/run/kubernetes:/var/run/kubernetes
without a if
.
@@ -565,9 +580,16 @@ write_files: | |||
| base64 -d > $f | |||
mv -f $f ${encKey%.enc} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go that way, the decrypted file must be moved to /var/run/kubernetes/<a well-known name if any>
rather than ${encKey%.enc}
{{- if .Experimental.ClusterTLSBootstrap.Enabled }} | ||
--volume=kuberun,kind=host,source=/var/run/kubernetes,readOnly=true \ | ||
--mount=volume=kuberun,target=/var/run/kubernetes \ | ||
{{- end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go that way, this block could be just:
--volume=kuberun,kind=host,source=/var/run/kubernetes,readOnly=true \
--mount=volume=kuberun,target=/var/run/kubernetes \
without a if
.
LGTM 👍 |
Just made a mess trying to squash the whole thing. 😅 Let me spend a little more time testing things to ensure everything is still working so you can merge. |
@mumoshu done! 🎉 |
test/integration/maincluster_test.go
Outdated
@@ -893,6 +898,8 @@ experimental: | |||
enabled: true | |||
clusterAutoscalerSupport: | |||
enabled: true | |||
clusterTLSBootstrap: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sry but the last nit; Would you mind informing me if there's any specific reason you've prefixed the key with "cluster"? Can we call it just tlsBootstrap
for consistency with the k8s doc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mumoshu I actually never gave much thought to the configuration name. I guess tlsBootstrap
is better. 👍
However, this is probably something we'll have to do for Kubernetes 1.6.
This flag is not needed since we already hook the kube-admin user to the cluster-admin role, which grants super-user access to the cluster. Another reason for removing this flag is that this flag is not present in the Kubernetes 1.6 API Server.
@mumoshu fixed, and changed a few other minor things as well. I also took the opportunity to test this PR with Calico enabled, which I forgot to do in my previous tests. I found a bug there, which I'm going to fix ASAP. |
This was causing problems, so I reverted the flag removal commit.
Okay, now apparently everything's working. 😄 |
LGTM. Thanks a lot for the great work @danielfm 👍 |
* kubernetes-incubator/master: (47 commits) Update README.md Automatic recovery from permanent failures of etcd3 nodes (kubernetes-retired#417) New settings: nodeMonitorGracePeriod, disableSecurityGroupIngress for controller-manager, nodeStatusUpdateFrequency for worker kubelet (kubernetes-retired#473) Correct file references Deprecate verbose legacy keys in favor of corresponding nested keys (kubernetes-retired#481) Update kube-system using kubectl (kubernetes-retired#472) Support for multiple k8s API endpoints (kubernetes-retired#468) Introduce the rescheduler (kubernetes-retired#441) Kubelet TLS bootstrapping (kubernetes-retired#449) Fix mount directory for containerized-build-release-binaries script Setup net.netfilter.nf_conntrack_max and fix error "nf_conntrack: table full, dropping packet" Fix broken links Move auth token file assest outside of if statement Stop uploading redundant stack.json to S3 Fixes kubernetes-retired#334 Update kubernetes-on-aws-node-pool.md Fix "flannel-docker-opts.service" in cn-north-1 Update "pause-amd64.service" Update CONTRIBUTING.md AWS CLI region default Fix docs ...
…-improvements * kubernetes-incubator/master: (26 commits) Update README.md Automatic recovery from permanent failures of etcd3 nodes (kubernetes-retired#417) New settings: nodeMonitorGracePeriod, disableSecurityGroupIngress for controller-manager, nodeStatusUpdateFrequency for worker kubelet (kubernetes-retired#473) Correct file references Deprecate verbose legacy keys in favor of corresponding nested keys (kubernetes-retired#481) Update kube-system using kubectl (kubernetes-retired#472) Support for multiple k8s API endpoints (kubernetes-retired#468) Introduce the rescheduler (kubernetes-retired#441) Kubelet TLS bootstrapping (kubernetes-retired#449) Fix mount directory for containerized-build-release-binaries script Setup net.netfilter.nf_conntrack_max and fix error "nf_conntrack: table full, dropping packet" Fix broken links Move auth token file assest outside of if statement Stop uploading redundant stack.json to S3 Fixes kubernetes-retired#334 Update kubernetes-on-aws-node-pool.md Update CONTRIBUTING.md AWS CLI region default Fix docs Fix build path and package ref Correct e2e package ref ...
* Add experimental Kubelet TLS bootstrapping This PR introduces an experimental feature for provisioning TLS certificates for worker nodes via the alpha [certificate signing requests](https://kubernetes.io/docs/admin/kubelet-tls-bootstrapping/) workflow. If RBAC is enabled in `cluster.yaml` via the `tlsBootstrap.enabled` key, I also made sure to set up the appropriate cluster role bindings so that the token used for sending the certificate signing requests can only make requests related to certificate provisioning, as instructed by the official documentation. Checklist: - [x] Add a `kube-aws render token-file` that creates a token auth file with a cluster TLS bootstrap (if the feature is turned on in `cluster.yaml`, or an empty file otherwise) - [x] Check whether the auth token file has a proper bootstrap token when the TLS bootstrap feature is turned on; if not, return a self-explaining message so the user knows what to do in order to fix the problem - [x] Read the bootstrap token from the auth token file and generate an encrypted version of it - [x] Send the encrypted bootstrap token in worker userdata - [x] Replace the decrypted token in the kubeconfig used for the TLS bootstrapping workflow - [x] Fix / add more tests wherever possible - [x] Manual tests - [x] Create clusters with TLS bootstrapping enabled and disabled - [x] Make sure it plays well with other experimental features (i.e. self-hosted Calico, AWS node labels, node draining) - [x] Try to add worker nodes (and stop/start the kubelet on an existing node) to see if they can join the cluster without the need to manually approve every new certificate signing request - [x] Test token rollout methods (**Edit:** `kube-aws update` works as expected, since the generated certificates remain valid even though the token used to request it has changed) Closes kubernetes-retired#406.
This PR introduces an experimental feature for provisioning TLS certificates for worker nodes via the alpha certificate signing requests workflow.
If RBAC is enabled in
cluster.yaml
, I also made sure to set up the appropriate cluster role bindings so that the token used for sending the certificate signing requests can only make requests related to certificate provisioning, as instructed by the official documentation.Checklist:
Add a(undone in Improve auth tokens / TLS bootstrapping UX #489)kube-aws render token-file
that creates a token auth file with a cluster TLS bootstrap (if the feature is turned on incluster.yaml
, or an empty file otherwise)kube-aws update
works as expected, since the generated certificates remain valid even though the token used to request it has changed)Closes #406.