Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems to add a new node using calico #7519

Closed
jbaojunior opened this issue Apr 16, 2021 · 8 comments
Closed

Problems to add a new node using calico #7519

jbaojunior opened this issue Apr 16, 2021 · 8 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@jbaojunior
Copy link

Environment:

  • Cloud provider or hardware configuration:

  • OS:
    Linux 4.19.0-5-amd64 x86_64
    PRETTY_NAME="Debian GNU/Linux 10 (buster)"
    NAME="Debian GNU/Linux"
    VERSION_ID="10"
    VERSION="10 (buster)"
    VERSION_CODENAME=buster
    ID=debian
    HOME_URL="https://www.debian.org/"
    SUPPORT_URL="https://www.debian.org/support"
    BUG_REPORT_URL="https://bugs.debian.org/"

  • Version of Ansible : 2.9.19

  • Version of Python: 3.9.3

Kubespray version (commit): 4661e7d

Network plugin used: Calico

Full inventory with variables:

[all]
node1 ansible_host=xxx.xxx.xxx.xxx ip=xxx.xxx.xxx.xxx etcd_member_name=etcd1
node2 ansible_host=xxx.xxx.xxx.xxx ip=xxx.xxx.xxx.xxx etcd_member_name=etcd2
node3 ansible_host=xxx.xxx.xxx.xxx1 ip=xxx.xxx.xxx.xxx etcd_member_name=etcd3
node4 ansible_host=xxx.xxx.xxx.xxx ip=xxx.xxx.xxx.xxx
node5 ansible_host=xxx.xxx.xxx.xxx ip=xxx.xxx.xxx.xxx
node6 ansible_host=xxx.xxx.xxx.xxx ip=xxx.xxx.xxx.xxx

# ## configure a bastion host if your nodes are not directly reachable
# bastion ansible_host=x.x.x.x ansible_user=some_user

[kube-master]
node1
node2

[etcd]
node1
node2
node3

[kube-node]
node3
node4
node5
node6

[k8s-cluster:children]
kube-master
kube-node

Command used to invoke ansible:

ansible-playbook -i inventory/sample/hosts.ini --diff --limit=node6 scale.yml -u ansible -become

Output of ansible run:

...
TASK [Set fact calico_datastore to etcd if needed] ****************************************************************************************************************************************************************
fatal: [node6]: FAILED! => {"msg": "The conditional check ''etcd_endpoints' in calico_cni_config.plugins.0' failed. The error was: error while evaluating conditional ('etcd_endpoints' in calico_cni_config.plugins.0): 'dict object' has no attribute 'plugins'\n\nThe error appears to be in '/home/test/devops-kubespray/pre.yml': line 14, column 9, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n          calico_cni_config: \"{{ calico_cni_config_slurp['content'] | b64decode | from_json }}\"\n      - name: Set fact calico_datastore to etcd if needed\n        ^ here\n"}

PLAY RECAP ********************************************************************************************************************************************************************************************************
node6     : ok=3    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

Anything else do we need to know:
The file 10-calico.conflist is created but with a different configuration. When the pre.yml is executed this check look by plugins.0 but this does not exist. I think that this should be changed to "'etcd_endpoints' in calico_cni_config".

The first file created is:

{
    "name": "k8s-pod-network",
    "type": "calico",
    "etcd_endpoints": "",
    "etcd_discovery_srv": "",
    "etcd_key_file": "",
    "etcd_cert_file": "",
    "etcd_ca_cert_file": "",
    "log_level": "warn",
    "ipam": {
        "type": "calico-ipam"
    },
    "policy": {
        "type": "k8s",
        "k8s_api_root": "https://10.233.0.1:443",
        "k8s_auth_token": "....."
    },
    "kubernetes": {
        "kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
    }
}

The right file is:

{
  "name": "cni0",
  "cniVersion":"0.3.1",
  "plugins":[
    {
      "nodename": "nodex",
          "type": "calico",
      "log_level": "info",
      "etcd_endpoints": "https://xxxxxx:2379",
      "etcd_cert_file": "/etc/calico/certs/cert.crt",
      "etcd_key_file": "/etc/calico/certs/key.pem",
      "etcd_ca_cert_file": "/etc/calico/certs/ca_cert.crt",
      "ipam": {
        "type": "calico-ipam",
        "assign_ipv4": "true",
        "ipv4_pools": ["10.233.64.0/18"]
      },
      "policy": {
        "type": "k8s"
      },      "kubernetes": {
        "kubeconfig": "__KUBECONFIG_FILEPATH__"
      }
    },
    {
      "type":"portmap",
      "capabilities":{
        "portMappings":true
      }
    }
  ]
}
@jbaojunior jbaojunior added the kind/bug Categorizes issue or PR as related to a bug. label Apr 16, 2021
@liupeng0518
Copy link
Member

what is your calico_datastore?
And this #7495

@cristicalin
Copy link
Contributor

The file 10-calico.conflist is generated by the install-cni init container of the calico-node pod that runs on each node. This may be an issue with the parsing of the input template that is generate by kubespray.

I would advise you to check and share the following:

  • the contents of the template file generated by kubespray on the target node: /etc/cni/net.d/calico.conflist.template
  • the logs of the install-cni initContainer of the calico-node pod on your node6

The ansible error you are seeing comes from the malformed 10-calico.conflist and the error may be in the templated input.

@jbaojunior
Copy link
Author

@liupeng0518 the datastore is etcd.

@cristicalin Thank you. I will look at the template and will post it here.

@liupeng0518
Copy link
Member

#7449

@jbaojunior
Copy link
Author

@cristicalin the problem is that the template is created on install.yml that is executed after the pre.yml. In my tests, the first file always was created with the wrong template.

@floryut
Copy link
Member

floryut commented Apr 19, 2021

Closing as per #7519 (comment)
Reopen if needed

@floryut floryut closed this as completed Apr 19, 2021
@cristicalin
Copy link
Contributor

@jbaojunior the 10-calico.conflist file itself is not created by kubespray but by calico-node. This means that something else is creating the file in place before kubespray so when pre.yml is executed it sees the file with the wrong content.

Potentially this may be caused by the fact that the kubelet starts on the node and starts calico-node before the calico role is played, in which case I would wonder if #7449 doesn't indeed solve your issue. If it doesn't please reopen.

@rouja
Copy link

rouja commented Aug 9, 2021

Hi,

I think there is still an issue on this pre.yml because we can have 3 situations :

1- We add a new node and the content of 10-calico.conflist hasn't the "plugins" section.
2- We run the playbook against a cluster with calico_datastore set to etcd
3- We run the playbook against a cluster with calico_datastore set to kdd

The pre.yml contains :

---
- name: Slurp CNI config
  slurp:
    src: /etc/cni/net.d/10-calico.conflist
  register: calico_cni_config_slurp
  failed_when: false

- block:
  - name: Set fact calico_cni_config from slurped CNI config
    set_fact:
      calico_cni_config: "{{ calico_cni_config_slurp['content'] | b64decode | from_json }}"
  - name: Set fact calico_datastore to etcd if needed
    set_fact:
      calico_datastore: etcd
    when: "'etcd_endpoints' in calico_cni_config.plugins.0"
  when: calico_cni_config_slurp.content is defined

Whitch means :
For the first case, the playbook fails on 'etcd_endpoints' in calico_cni_config.plugins.0 because there is not a plugins section.
For the second case, the playbook works
For the third case, the playbook works but calico_datastore is not set.

In fact, calico_datastore should be set in group_vars/k8s-cluster/k8s-net-calico.yml so I'm not sure to understand why we do this here.

In my opinion we can fix this in two way :

1- We simply remove the task below :

  - name: Set fact calico_datastore to etcd if needed
    set_fact:
      calico_datastore: etcd
    when: "'etcd_endpoints' in calico_cni_config.plugins.0"

2- We modify the playbook to manage the 3 cases.

If nobody manifests an opinion in the next few days, I will do a PR with the first solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants