Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenStack Cloud provider init failure on new clusters v2.24.0 #350

Closed
anders-elastisys opened this issue Feb 16, 2024 · 0 comments · Fixed by #356
Closed

OpenStack Cloud provider init failure on new clusters v2.24.0 #350

anders-elastisys opened this issue Feb 16, 2024 · 0 comments · Fixed by #356
Assignees
Labels
kind/bug Something isn't working

Comments

@anders-elastisys
Copy link
Contributor

anders-elastisys commented Feb 16, 2024

Describe the bug
There seems to be issues when creating new v2.24.0 clusters on openstack cloud providers where the openstack pods start and taints the nodes before coredns can start putting them in a pending state, and causing the openstack pods to crash as they fail to resolve the openstack endpoint:

Cloud provider could not be initialized: could not init cloud provider "openstack": Post "https://<openstack-endpoint>": dial tcp: lookup <openstack-endpoint> on 10.233.0.3:53: write udp ...->10.233.0.3:53: write: operation not permitted

Related upstream Kubespray issue: kubernetes-sigs/kubespray#10914

To Reproduce
Steps to reproduce the behavior:

  1. On a openstack cloud, create a cluster with v2.24.0, Kubespray will finish without errors
  2. Check kube-system namespace, see openstack pods crashing with logs similar to the output above

Expected behavior
Creating new clusters with kubespray should work fine on all cloud providers.

Version (add all relevant versions):

  • Compliant kubernetes kubespray v2.24.0-ck8s1

Additional context

A workaround for now is to add tolerations to the coredns pods. E.g. create a file tolerations.yaml:

# tolerations.yaml
spec:
  template:
    spec:
      tolerations:
      - effect: NoSchedule
        key: node.cloudprovider.kubernetes.io/uninitialized
        value: "true"
      - effect: NoSchedule
        key: node-role.kubernetes.io/control-plane

And patch coredns with the tolerations in the file:

kubectl patch deployment coredns -n kube-system --patch "$(cat tolerations.yaml)"

Once the openstack pods run without crashing you can remove the node.cloudprovider.kubernetes.io/uninitialized taint.

@anders-elastisys anders-elastisys added the kind/bug Something isn't working label Feb 16, 2024
@davidumea davidumea self-assigned this Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants