Upgrading the Bootstrap Node may fail if kube-apiserver is too slow to start #2533

gdemonet · 2020-05-07T15:04:46Z

Component: salt

What happened:

After upgrading the Bootstrap node (during the execution of the deploy_node orchestrate triggered by the upgrade.sh script), the uncordoning fails with ConnectionRefused against the Bootstrap's kube-apiserver.

What was expected:

This orchestration should not fail because of the changes it creates.

In this case, the particular error shouldn't even be a problem: we are waiting for an API server to answer by asking through the local proxy (:7443). However, since the kubeconfig we use references the local instance of kube-apiserver (:6443), this one may not be ready yet (if others were, the proxy would redirect requests to them).

Steps to reproduce:

Run the upgrade script on a cluster with highly-available control plane, it may fail when upgrading the Bootstrap. It could also work if things are fast enough to start.

Resolution proposal (optional):

Salt Master shouldn't use the generated admin.conf, which is intended for direct access to a specific master's instance of kube-apiserver (it references https://<control_plane_ip>:6443).
Instead, we should generate another kubeconfig for Salt Master, which should point to its local apiserver-proxy (so, https://127.0.0.1:7443).

To make things clean, we should also generate a custom certificate for this kubeconfig, with:

CN: "salt-master-{{ node_name }}" (username)
O: "system:masters" (group)

In the future, we should use a different group (or set of groups, depending on what (Cluster)Role(s) MetalK8s will provide by default), in the same spirit as #1775.

The text was updated successfully, but these errors were encountered:

Instead of using the `/etc/kubernetes/admin.conf` file which points to a specific master's `kube-apiserver` instance, we generate another one dedicated to Salt Master, and configure it to point to the local `apiserver-proxy` (which can then route to other masters if the local one isn't available). In addition, this kubeconfig generates its own certificate, which could later map to another group (and thus, other (Cluster)Role(s) than the current "system:masters"). Note that we remove the unneeded `/etc/kubernetes` mount in both Salt Master and Salt API containers (SaltAPI didn't need it since 41ba749). Fixes: #2533

Instead of using the `/etc/kubernetes/admin.conf` file which points to a specific master's `kube-apiserver` instance, we generate another one dedicated to Salt Master, and configure it to point to the local `apiserver-proxy` (which can then route to other masters if the local one isn't available). In addition, this kubeconfig generates its own certificate, which could later map to another group (and thus, other (Cluster)Role(s) than the current "system:masters"). Note that we reduce the unneeded `/etc/kubernetes` mount to `/etc/kubernetes/pki` (for SA signing key and etcd encryption key) in both Salt Master and Salt API containers (SaltAPI didn't need it since 41ba749). Fixes: #2533

Add a changelog entry for the new kubeconfig generated for the Salt master interaction with kubernetes apiserver Sees: #2533

Instead of using the `/etc/kubernetes/admin.conf` file which points to a specific master's `kube-apiserver` instance, we generate another one dedicated to Salt Master, and configure it to point to the local `apiserver-proxy` (which can then route to other masters if the local one isn't available). In addition, this kubeconfig generates its own certificate, which could later map to another group (and thus, other (Cluster)Role(s) than the current "system:masters"). Note that we reduce the unneeded `/etc/kubernetes` mount to `/etc/kubernetes/pki` (for SA signing key and etcd encryption key) in both Salt Master and Salt API containers (SaltAPI didn't need it since 41ba749). Fixes: #2533

Add a changelog entry for the new kubeconfig generated for the Salt master interaction with kubernetes apiserver Sees: #2533

gdemonet added kind:bug Something isn't working topic:deployment Bugs in or enhancements to deployment stages topic:lifecycle Issues related to upgrade or downgrade of MetalK8s complexity:easy Something that requires less than a day to fix labels May 7, 2020

gdemonet added this to the MetalK8s 2.4.4 milestone May 7, 2020

gdemonet self-assigned this May 7, 2020

gdemonet mentioned this issue May 7, 2020

salt: Generate custom kubeconfig for Salt Master #2534

Merged

gdemonet modified the milestones: MetalK8s 2.4.4, MetalK8s 2.5.2 May 29, 2020

TeddyAndrieux self-assigned this Jun 10, 2020

TeddyAndrieux added a commit that referenced this issue Jun 12, 2020

changelog: Add entry for salt-master dedicated kubeconfig

d2da81f

Add a changelog entry for the new kubeconfig generated for the Salt master interaction with kubernetes apiserver Sees: #2533

gdemonet pushed a commit that referenced this issue Jun 16, 2020

changelog: Add entry for salt-master dedicated kubeconfig

84d6ff0

Add a changelog entry for the new kubeconfig generated for the Salt master interaction with kubernetes apiserver Sees: #2533

thomasdanan modified the milestones: MetalK8s 2.5.2, MetalK8s 2.5.1 Jun 18, 2020

gdemonet added complexity:hard Something that may require up to a week to fix release:blocker An issue that blocks a release until resolved and removed complexity:easy Something that requires less than a day to fix labels Jun 18, 2020

TeddyAndrieux added a commit that referenced this issue Jun 23, 2020

changelog: Add entry for salt-master dedicated kubeconfig

01b84b3

Add a changelog entry for the new kubeconfig generated for the Salt master interaction with kubernetes apiserver Sees: #2533

bert-e closed this as completed in 8d0df5d Jun 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrading the Bootstrap Node may fail if kube-apiserver is too slow to start #2533

Upgrading the Bootstrap Node may fail if kube-apiserver is too slow to start #2533

gdemonet commented May 7, 2020

Upgrading the Bootstrap Node may fail if kube-apiserver is too slow to start #2533

Upgrading the Bootstrap Node may fail if kube-apiserver is too slow to start #2533

Comments

gdemonet commented May 7, 2020