Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading the Bootstrap Node may fail if kube-apiserver is too slow to start #2533

Closed
gdemonet opened this issue May 7, 2020 · 0 comments
Closed
Assignees
Labels
complexity:hard Something that may require up to a week to fix kind:bug Something isn't working release:blocker An issue that blocks a release until resolved topic:deployment Bugs in or enhancements to deployment stages topic:lifecycle Issues related to upgrade or downgrade of MetalK8s

Comments

@gdemonet
Copy link
Contributor

gdemonet commented May 7, 2020

Component: salt

What happened:

After upgrading the Bootstrap node (during the execution of the deploy_node orchestrate triggered by the upgrade.sh script), the uncordoning fails with ConnectionRefused against the Bootstrap's kube-apiserver.

What was expected:

This orchestration should not fail because of the changes it creates.

In this case, the particular error shouldn't even be a problem: we are waiting for an API server to answer by asking through the local proxy (:7443). However, since the kubeconfig we use references the local instance of kube-apiserver (:6443), this one may not be ready yet (if others were, the proxy would redirect requests to them).

Steps to reproduce:

Run the upgrade script on a cluster with highly-available control plane, it may fail when upgrading the Bootstrap. It could also work if things are fast enough to start.

Resolution proposal (optional):

Salt Master shouldn't use the generated admin.conf, which is intended for direct access to a specific master's instance of kube-apiserver (it references https://<control_plane_ip>:6443).
Instead, we should generate another kubeconfig for Salt Master, which should point to its local apiserver-proxy (so, https://127.0.0.1:7443).

To make things clean, we should also generate a custom certificate for this kubeconfig, with:

  • CN: "salt-master-{{ node_name }}" (username)
  • O: "system:masters" (group)

In the future, we should use a different group (or set of groups, depending on what (Cluster)Role(s) MetalK8s will provide by default), in the same spirit as #1775.

@gdemonet gdemonet added kind:bug Something isn't working topic:deployment Bugs in or enhancements to deployment stages topic:lifecycle Issues related to upgrade or downgrade of MetalK8s complexity:easy Something that requires less than a day to fix labels May 7, 2020
@gdemonet gdemonet added this to the MetalK8s 2.4.4 milestone May 7, 2020
gdemonet added a commit that referenced this issue May 7, 2020
Instead of using the `/etc/kubernetes/admin.conf` file which points to a
specific master's `kube-apiserver` instance, we generate another one
dedicated to Salt Master, and configure it to point to the local
`apiserver-proxy` (which can then route to other masters if the local
one isn't available).

In addition, this kubeconfig generates its own certificate, which could
later map to another group (and thus, other (Cluster)Role(s) than the
current "system:masters").

Note that we remove the unneeded `/etc/kubernetes` mount in both Salt
Master and Salt API containers (SaltAPI didn't need it since 41ba749).

Fixes: #2533
gdemonet added a commit that referenced this issue May 7, 2020
Instead of using the `/etc/kubernetes/admin.conf` file which points to a
specific master's `kube-apiserver` instance, we generate another one
dedicated to Salt Master, and configure it to point to the local
`apiserver-proxy` (which can then route to other masters if the local
one isn't available).

In addition, this kubeconfig generates its own certificate, which could
later map to another group (and thus, other (Cluster)Role(s) than the
current "system:masters").

Note that we remove the unneeded `/etc/kubernetes` mount in both Salt
Master and Salt API containers (SaltAPI didn't need it since 41ba749).

Fixes: #2533
@gdemonet gdemonet self-assigned this May 7, 2020
gdemonet added a commit that referenced this issue May 7, 2020
Instead of using the `/etc/kubernetes/admin.conf` file which points to a
specific master's `kube-apiserver` instance, we generate another one
dedicated to Salt Master, and configure it to point to the local
`apiserver-proxy` (which can then route to other masters if the local
one isn't available).

In addition, this kubeconfig generates its own certificate, which could
later map to another group (and thus, other (Cluster)Role(s) than the
current "system:masters").

Note that we remove the unneeded `/etc/kubernetes` mount in both Salt
Master and Salt API containers (SaltAPI didn't need it since 41ba749).

Fixes: #2533
gdemonet added a commit that referenced this issue May 8, 2020
Instead of using the `/etc/kubernetes/admin.conf` file which points to a
specific master's `kube-apiserver` instance, we generate another one
dedicated to Salt Master, and configure it to point to the local
`apiserver-proxy` (which can then route to other masters if the local
one isn't available).

In addition, this kubeconfig generates its own certificate, which could
later map to another group (and thus, other (Cluster)Role(s) than the
current "system:masters").

Note that we reduce the unneeded `/etc/kubernetes` mount to
`/etc/kubernetes/pki` (for SA signing key and etcd encryption key) in
both Salt Master and Salt API containers (SaltAPI didn't need it since
41ba749).

Fixes: #2533
gdemonet added a commit that referenced this issue May 8, 2020
Instead of using the `/etc/kubernetes/admin.conf` file which points to a
specific master's `kube-apiserver` instance, we generate another one
dedicated to Salt Master, and configure it to point to the local
`apiserver-proxy` (which can then route to other masters if the local
one isn't available).

In addition, this kubeconfig generates its own certificate, which could
later map to another group (and thus, other (Cluster)Role(s) than the
current "system:masters").

Note that we reduce the unneeded `/etc/kubernetes` mount to
`/etc/kubernetes/pki` (for SA signing key and etcd encryption key) in
both Salt Master and Salt API containers (SaltAPI didn't need it since
41ba749).

Fixes: #2533
gdemonet added a commit that referenced this issue May 18, 2020
Instead of using the `/etc/kubernetes/admin.conf` file which points to a
specific master's `kube-apiserver` instance, we generate another one
dedicated to Salt Master, and configure it to point to the local
`apiserver-proxy` (which can then route to other masters if the local
one isn't available).

In addition, this kubeconfig generates its own certificate, which could
later map to another group (and thus, other (Cluster)Role(s) than the
current "system:masters").

Note that we reduce the unneeded `/etc/kubernetes` mount to
`/etc/kubernetes/pki` (for SA signing key and etcd encryption key) in
both Salt Master and Salt API containers (SaltAPI didn't need it since
41ba749).

Fixes: #2533
gdemonet added a commit that referenced this issue May 19, 2020
Instead of using the `/etc/kubernetes/admin.conf` file which points to a
specific master's `kube-apiserver` instance, we generate another one
dedicated to Salt Master, and configure it to point to the local
`apiserver-proxy` (which can then route to other masters if the local
one isn't available).

In addition, this kubeconfig generates its own certificate, which could
later map to another group (and thus, other (Cluster)Role(s) than the
current "system:masters").

Note that we reduce the unneeded `/etc/kubernetes` mount to
`/etc/kubernetes/pki` (for SA signing key and etcd encryption key) in
both Salt Master and Salt API containers (SaltAPI didn't need it since
41ba749).

Fixes: #2533
gdemonet added a commit that referenced this issue May 25, 2020
Instead of using the `/etc/kubernetes/admin.conf` file which points to a
specific master's `kube-apiserver` instance, we generate another one
dedicated to Salt Master, and configure it to point to the local
`apiserver-proxy` (which can then route to other masters if the local
one isn't available).

In addition, this kubeconfig generates its own certificate, which could
later map to another group (and thus, other (Cluster)Role(s) than the
current "system:masters").

Note that we reduce the unneeded `/etc/kubernetes` mount to
`/etc/kubernetes/pki` (for SA signing key and etcd encryption key) in
both Salt Master and Salt API containers (SaltAPI didn't need it since
41ba749).

Fixes: #2533
gdemonet added a commit that referenced this issue May 25, 2020
Instead of using the `/etc/kubernetes/admin.conf` file which points to a
specific master's `kube-apiserver` instance, we generate another one
dedicated to Salt Master, and configure it to point to the local
`apiserver-proxy` (which can then route to other masters if the local
one isn't available).

In addition, this kubeconfig generates its own certificate, which could
later map to another group (and thus, other (Cluster)Role(s) than the
current "system:masters").

Note that we reduce the unneeded `/etc/kubernetes` mount to
`/etc/kubernetes/pki` (for SA signing key and etcd encryption key) in
both Salt Master and Salt API containers (SaltAPI didn't need it since
41ba749).

Fixes: #2533
TeddyAndrieux pushed a commit that referenced this issue Jun 10, 2020
Instead of using the `/etc/kubernetes/admin.conf` file which points to a
specific master's `kube-apiserver` instance, we generate another one
dedicated to Salt Master, and configure it to point to the local
`apiserver-proxy` (which can then route to other masters if the local
one isn't available).

In addition, this kubeconfig generates its own certificate, which could
later map to another group (and thus, other (Cluster)Role(s) than the
current "system:masters").

Note that we reduce the unneeded `/etc/kubernetes` mount to
`/etc/kubernetes/pki` (for SA signing key and etcd encryption key) in
both Salt Master and Salt API containers (SaltAPI didn't need it since
41ba749).

Fixes: #2533
@TeddyAndrieux TeddyAndrieux self-assigned this Jun 10, 2020
TeddyAndrieux pushed a commit that referenced this issue Jun 10, 2020
Instead of using the `/etc/kubernetes/admin.conf` file which points to a
specific master's `kube-apiserver` instance, we generate another one
dedicated to Salt Master, and configure it to point to the local
`apiserver-proxy` (which can then route to other masters if the local
one isn't available).

In addition, this kubeconfig generates its own certificate, which could
later map to another group (and thus, other (Cluster)Role(s) than the
current "system:masters").

Note that we reduce the unneeded `/etc/kubernetes` mount to
`/etc/kubernetes/pki` (for SA signing key and etcd encryption key) in
both Salt Master and Salt API containers (SaltAPI didn't need it since
41ba749).

Fixes: #2533
TeddyAndrieux pushed a commit that referenced this issue Jun 11, 2020
Instead of using the `/etc/kubernetes/admin.conf` file which points to a
specific master's `kube-apiserver` instance, we generate another one
dedicated to Salt Master, and configure it to point to the local
`apiserver-proxy` (which can then route to other masters if the local
one isn't available).

In addition, this kubeconfig generates its own certificate, which could
later map to another group (and thus, other (Cluster)Role(s) than the
current "system:masters").

Note that we reduce the unneeded `/etc/kubernetes` mount to
`/etc/kubernetes/pki` (for SA signing key and etcd encryption key) in
both Salt Master and Salt API containers (SaltAPI didn't need it since
41ba749).

Fixes: #2533
TeddyAndrieux pushed a commit that referenced this issue Jun 11, 2020
Instead of using the `/etc/kubernetes/admin.conf` file which points to a
specific master's `kube-apiserver` instance, we generate another one
dedicated to Salt Master, and configure it to point to the local
`apiserver-proxy` (which can then route to other masters if the local
one isn't available).

In addition, this kubeconfig generates its own certificate, which could
later map to another group (and thus, other (Cluster)Role(s) than the
current "system:masters").

Note that we reduce the unneeded `/etc/kubernetes` mount to
`/etc/kubernetes/pki` (for SA signing key and etcd encryption key) in
both Salt Master and Salt API containers (SaltAPI didn't need it since
41ba749).

Fixes: #2533
TeddyAndrieux added a commit that referenced this issue Jun 12, 2020
Add a changelog entry for the new kubeconfig generated for the Salt
master interaction with kubernetes apiserver

Sees: #2533
gdemonet added a commit that referenced this issue Jun 16, 2020
Instead of using the `/etc/kubernetes/admin.conf` file which points to a
specific master's `kube-apiserver` instance, we generate another one
dedicated to Salt Master, and configure it to point to the local
`apiserver-proxy` (which can then route to other masters if the local
one isn't available).

In addition, this kubeconfig generates its own certificate, which could
later map to another group (and thus, other (Cluster)Role(s) than the
current "system:masters").

Note that we reduce the unneeded `/etc/kubernetes` mount to
`/etc/kubernetes/pki` (for SA signing key and etcd encryption key) in
both Salt Master and Salt API containers (SaltAPI didn't need it since
41ba749).

Fixes: #2533
gdemonet pushed a commit that referenced this issue Jun 16, 2020
Add a changelog entry for the new kubeconfig generated for the Salt
master interaction with kubernetes apiserver

Sees: #2533
@gdemonet gdemonet added complexity:hard Something that may require up to a week to fix release:blocker An issue that blocks a release until resolved and removed complexity:easy Something that requires less than a day to fix labels Jun 18, 2020
TeddyAndrieux added a commit that referenced this issue Jun 23, 2020
Add a changelog entry for the new kubeconfig generated for the Salt
master interaction with kubernetes apiserver

Sees: #2533
@bert-e bert-e closed this as completed in 8d0df5d Jun 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complexity:hard Something that may require up to a week to fix kind:bug Something isn't working release:blocker An issue that blocks a release until resolved topic:deployment Bugs in or enhancements to deployment stages topic:lifecycle Issues related to upgrade or downgrade of MetalK8s
Projects
None yet
Development

No branches or pull requests

3 participants