Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calico + etcd mayhem when upgrading to v2.23.0 #10436

Closed
olevitt opened this issue Sep 14, 2023 · 2 comments · Fixed by #10438
Closed

Calico + etcd mayhem when upgrading to v2.23.0 #10436

olevitt opened this issue Sep 14, 2023 · 2 comments · Fixed by #10438
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@olevitt
Copy link
Contributor

olevitt commented Sep 14, 2023

Hello,

When upgrading to v2.23.0 in a calico in etcd mode, every calico-node pod will have it's configuration set to the same node name (first controlplane) resulting in IP allocation mayhem with every new pod getting an IP from the first controlplane IP block resulting in broken network for any new pod (existing pods are fine).
This seems to be due to #10177 which makes the install-cni init-container of calico-node pulls configuration from a single configmap that has first controlplane name set in stone (4f85b75#diff-91635da451087a93ab261ec90f794c825a5d584d12562fc94d183c50f63d81c3R43) instead of having it parametrized by node name (which is the case with kdd mode : 4f85b75#diff-91635da451087a93ab261ec90f794c825a5d584d12562fc94d183c50f63d81c3R38 and was the case in etcd mode before this PR when config was pulled from a config file on each host).

One workaround is to change calico-config (namespace kube-system) configmap replacing nodename with "nodename": "__KUBERNETES_NODE_NAME__" and then adding

        - name: KUBERNETES_NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName

to env of install-cni init container of calico-node daemonset (just like it's done for kdd mode, see here :

{% if calico_datastore == "kdd" %}
# Set the hostname based on the k8s node name.
- name: KUBERNETES_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
{% endif %}
).
calico-node pods will then restart, init container install-cni will write the right nodename to the config file on each node and voila. New pods will be fine but any pod created during the bug will have to be deleted so that it gets a new - correct - IP.

We will submit a PR to fix this but if you encounter this issue please try this workaround, it worked for us 🥳

@olevitt olevitt added the kind/bug Categorizes issue or PR as related to a bug. label Sep 14, 2023
@mzaian
Copy link
Contributor

mzaian commented Sep 14, 2023

/assign @olevitt

@jonathansloman
Copy link

Hi - will this fix also be added to a 2.23.2 release? The issue is preventing us from upgrading our cluster, and kubespray documentation specifies not to skip releases when upgrading (ie, we shouldn't go from 2.22 directly to 2.24), so we need a working 2.23 to give us an upgrade path.

Thank you.

This was referenced Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants