Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[help] added node with multus not registering with master #206

Closed
verizonold opened this issue Mar 20, 2018 · 23 comments
Closed

[help] added node with multus not registering with master #206

verizonold opened this issue Mar 20, 2018 · 23 comments

Comments

@verizonold
Copy link

verizonold commented Mar 20, 2018

Hi,
So, I followed your directions at http://dougbtv.com/nfvpe/2017/02/22/multus-cni/ and ran into some issues.

I ran your scripts with master only and then added a minion. However, I do not see the worker registering with the master. I see the following in journalctl. Can you please provide some feedback?

Mar 20 17:42:57 localhost.localdomain kubelet[8583]: W0320 17:42:57.954232    8583 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Mar 20 17:42:57 localhost.localdomain kubelet[8583]: E0320 17:42:57.954573    8583 kubelet.go:2120] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Mar 20 17:43:02 localhost.localdomain sudo[11009]:  vagrant : TTY=pts/0 ; PWD=/home/vagrant ; USER=root ; COMMAND=/bin/journalctl -xe
@dougbtv
Copy link
Member

dougbtv commented Mar 21, 2018

@verizonold -- cool that you're trying to get Multus going -- there should be a path forward to get it to work.

I haven't tested the workflow with running the playbooks for master only, and then adding a minion -- however, what it looks like is that the CNI configurations didn't get written to the minion, that's how the kubelet will know that the node is ready.

So one thing you might want to try is, you know how I have you curl a github gist into a multus.yml file -- go ahead and do a kubectl delete -f multus.yml wait for the pods to be deleted, and then do a kubectl create -f multus.yml -- in theory it'll write the config to the nodes.

That being said -- you might want to take a look at this more recent blog article on Multus. Granted -- it's talking about using Multus with CRDs (which you don't have to use, if you don't want to use CRDs, just change the example inventory/multus-extravars.yml in the tutorial to have multus_use_crd: false and then remove or ignore all the multus_ipam_* variables). However, I do recommend trying it out with CRDs, it's even more flexible.

Let me know if you give it another try and run into any issues -- I'll leave the issue open here for a bit to give you a chance to try it out.

@dougbtv dougbtv changed the title kube ansible issues added node with multus not registering with master Mar 21, 2018
@verizonold
Copy link
Author

@dougbtv thanks much! really appreciate this.
so should I be applying both flannel YAML and multus YAML or does multus YAML include the POD network config as well?

also, I did not use your ansible scripts to create the Kubernetes cluster since I had some failures. Should I prep my CentOS boxes (disable selinux, swap, iptables, docker install,...) before I run your ansible scripts to install Kubernetes?

@dougbtv
Copy link
Member

dougbtv commented Mar 21, 2018

If you do decide to use the playbooks -- they should perform all the steps to prep the CentOS boxes. We just merged a big change to the playbooks this morning, and as I think I mentioned before, a thorough how-to article should be coming up soon.

So if you look in this folder, there are a few yaml files -- namely "flannel.yaml" and "multus.yaml" -- you'd choose only one.

The multus.yaml file there includes both Multus and flannel, and then it also includes macvlan. Kind of as a demo to have flannel & macvlan interfaces in pods, having had Multus create those both.

I also replied to your issue on the Multus repo with some more detail, but, the gist is that those yaml files there are really just stock Flannel deployments that you can kick off with kubectl create -f some.yaml, but! They've got multus configurations packed into them instead of stock Flannel configurations -- so they're kind of two-in-one, Multus + Flannel + whatever else you want to put into multus (in this case, macvlan)

p.s. No prob! Appreciate some input, because I'd love if it were more straight forward to get this all deployed by someone who drops by this repo! :)

@verizonold
Copy link
Author

@dougbtv so, I am using Virtual Box and created one VM with Ansible running on it together with kube_install. Is this my virtual host that you refer in your blog? Should I see VM's for Master and Minions in VirtualBox after I run the ansible scripts?

@verizonold
Copy link
Author

@dougbtv also, I am planning on creating the master/minion VM's in virtual box and proceeding as per Plan B here:http://dougbtv.com/nfvpe/2017/02/16/kubernetes-1.5-centos/
Can I do the above and then apply multus.yaml on master?

@dougbtv
Copy link
Member

dougbtv commented Mar 21, 2018

Yep, that plan B should from the kube install article should work just fine. That is -- if you don't use this repo's virtual machine spin up methods, you just skip to using kube-install.yml (currently in ./playbooks/kube-install.yml @ master head). You'll just be missing out on the automation of the compilation and installation of Multus.

So generally, what you'd do is...

  • Spin up VMs however, virtualbox for example.
  • Follow instructions for plan b from that article, e.g. make your inventory, run kube install.
  • Compile Multus, and put the binaries in place.
  • Use the multus.yaml file -- and kubectl apply -f multus.yaml

I think that should generally work. After I get the article up that's an update to the kube 1.5 install you're looking at -- I'll also give an example extra variables to use to do the whole kube-install.yml including Multus here @ master head.

@verizonold
Copy link
Author

@dougbtv very helpful. So, if i used kube_install.yml, it installs multus as well? I thought i still had to:
ansible-playbook -i inventory/vms.inventory multus-cni.yml.

@dougbtv
Copy link
Member

dougbtv commented Mar 21, 2018

Actually the kube-install can install Multus as well!

I went ahead and published the new article, I was waiting for the merge today, and... I also added a section for Multus using some extra vars, too!

Give it a look @verizonold -- http://dougbtv.com/nfvpe/2018/03/21/kubernetes-on-centos/

@verizonold
Copy link
Author

verizonold commented Mar 21, 2018

@dougbtv so gave it a spin. And I get the following error:

TASK [multus-cni : Clone sriov-cni] *********************************************************************
ok: [kube-master]

TASK [multus-cni : Compile sriov-cni] *******************************************************************

TASK [multus-cni : Copy compiled cni binaries] **********************************************************
changed: [kube-master] => (item=/usr/src/sriov-cni/bin/*)
changed: [kube-master] => (item=/usr/src/multus-cni/bin/*)

RUNNING HANDLER [kube-install : restart kubelet] ********************************************************
fatal: [kube-master]: FAILED! => {"changed": false, "msg": "Error loading unit file 'kubelet': org.freedesktop.DBus.Error.InvalidArgs \"Invalid argument\""}

PLAY RECAP **********************************************************************************************
kube-master                : ok=31   changed=3    unreachable=0    failed=1

@verizonold
Copy link
Author

verizonold commented Mar 21, 2018

@dougbtv contents from journalctl:

Mar 21 22:25:26 kubeadm-master systemd[1]: kubelet.service failed.
Mar 21 22:25:36 kubeadm-master systemd[1]: kubelet.service holdoff time over, scheduling restart.
Mar 21 22:25:36 kubeadm-master systemd[1]: Started kubelet: The Kubernetes Node Agent.
Mar 21 22:25:36 kubeadm-master systemd[1]: Starting kubelet: The Kubernetes Node Agent...
Mar 21 22:25:36 kubeadm-master kubelet[7984]: I0321 22:25:36.927439    7984 feature_gate.go:226] feature gates: &{{} map[]}
Mar 21 22:25:36 kubeadm-master kubelet[7984]: I0321 22:25:36.927483    7984 controller.go:114] kubelet config controller: starting controller
Mar 21 22:25:36 kubeadm-master kubelet[7984]: I0321 22:25:36.927487    7984 controller.go:118] kubelet config controller: validating combination of defaults and flags
Mar 21 22:25:36 kubeadm-master kubelet[7984]: error: unable to load client CA file /etc/kubernetes/pki/ca.crt: open /etc/kubernetes/pki/ca.crt: no such file or directory
Mar 21 22:25:36 kubeadm-master systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
Mar 21 22:25:36 kubeadm-master systemd[1]: Unit kubelet.service entered failed state.
Mar 21 22:25:36 kubeadm-master systemd[1]: kubelet.service failed.
Mar 21 22:25:37 kubeadm-master systemd[1]: kubelet.service has more than one ExecStart= setting, which is only allowed for Type=oneshot services. Refusing.
Mar 21 22:25:47 kubeadm-master systemd[1]: kubelet.service holdoff time over, scheduling restart.
Mar 21 22:25:47 kubeadm-master systemd[1]: kubelet.service failed to schedule restart job: Unit is not loaded properly: Invalid argument.
Mar 21 22:25:47 kubeadm-master systemd[1]: Unit kubelet.service entered failed state.
Mar 21 22:25:47 kubeadm-master systemd[1]: kubelet.service failed.

@dougbtv
Copy link
Member

dougbtv commented Mar 22, 2018

Can you show me what the kubelet service looks like? It looks like it's complaining that there's something wrong with the systemd unit, which I haven't seen before, here's what an example one looks like from a recent run on one of my systems...

[root@kube-master /]# cat /etc/systemd/system/kubelet.service
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=http://kubernetes.io/docs/

[Service]
ExecStart=/usr/bin/kubelet
Restart=always
StartLimitInterval=0
RestartSec=10

[Install]
WantedBy=multi-user.target

Also what's the CentOS release look like, mine looks like...

[root@kube-master /]# uname -a
Linux kube-master 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@kube-master /]# cat /etc/redhat-release 
CentOS Linux release 7.4.1708 (Core) 

@verizonold
Copy link
Author

verizonold commented Mar 22, 2018

@dougbtv
Starting kubelet service:

[vagrant@kubeadm-master ~]$ sudo systemctl -l status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: error (Reason: Invalid argument)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: inactive (dead)
     Docs: http://kubernetes.io/docs/

Mar 22 18:01:39 kubeadm-master systemd[1]: kubelet.service has more than one ExecStart= setting, which is only allowed for Type=oneshot services. Refusing.
Mar 22 18:01:39 kubeadm-master systemd[1]: Cannot add dependency job for unit kubelet.service, ignoring: Unit is not loaded properly: Invalid argument.

@verizonold
Copy link
Author

verizonold commented Mar 22, 2018

@dougbtv

[vagrant@kubeadm-master ~]$ uname -a
Linux kubeadm-master 4.15.12-1.el7.elrepo.x86_64 #1 SMP Wed Mar 21 12:41:57 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
[vagrant@kubeadm-master ~]$


[vagrant@kubeadm-master ~]$ cat /etc/redhat-release
CentOS Linux release 7.4.1708 (Core)

@verizonold
Copy link
Author

verizonold commented Mar 22, 2018

@dougbtv

[vagrant@kubeadm-master ~]$ cat /etc/systemd/system/kubelet.service
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=http://kubernetes.io/docs/

[Service]
ExecStart=/usr/bin/kubelet
Restart=always
StartLimitInterval=0
RestartSec=10

[Install]
WantedBy=multi-user.target
[vagrant@kubeadm-master ~]$

@dougbtv
Copy link
Member

dougbtv commented Mar 22, 2018

Hrmmm, interesting -- the kubelet service looks just fine, and our systems are pretty close -- you do have a newer kernel -- which looks like it might come from the EPEL repos, maybe?

A few things more to try...

  1. Firstly I found that there's another kubelet config -- it's @ /etc/systemd/system/kubelet.service.d/10-kubeadm.conf with extended parameters for the kubelet. It has two ExecStart= lines -- the empty one of which is likely a problem, my systems aren't complaining about it, but, I'd recommend trying to remove it from master and node(s), and running the playbooks again. I opened up issue [bug] Empty ExecStart= line in 10-kubeadm.conf #211 to address it, it's not right is the bottom line.

  2. Also, you might consider a re-run on some systems that are spun up with the latest kernel from the non-extended kernel instead, this might expose other issues that we haven't run into yet. Looks like from the latest CentOS 7 cloud images the latest available kernel is 3.10.0-693.21.1.el7.

@verizonold
Copy link
Author

verizonold commented Mar 23, 2018

@dougbtv so I think the problem was with 10-kubeadm.conf. I had re run your ansible scripts a couple times and each time I run, it appends the ExecStart statement to this file.

So, i did an ansible run with my master/minion with playbooks/kube-install.yml. It generally went well. However, I had to do a kubeadm reset/init/join manually. And I did "kubectl apply -f multus.yaml".

I see the following which is probably not in operational state. Can you please let me know if my procedure above is right?

[vagrant@kubeadm-master ~]$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                     READY     STATUS              RESTARTS   AGE
kube-system   etcd-kubeadm-master                      1/1       Running             0          7m
kube-system   kube-apiserver-kubeadm-master            1/1       Running             0          7m
kube-system   kube-controller-manager-kubeadm-master   1/1       Running             0          7m
kube-system   kube-dns-6f4fd4bdf-frh8w                 0/3       ContainerCreating   0          1h
kube-system   kube-multus-ds-2p976                     1/2       CrashLoopBackOff    5          7m
kube-system   kube-proxy-9pkhg                         1/1       Running             0          1h
kube-system   kube-proxy-dngwg                         1/1       Running             0          1h
kube-system   kube-scheduler-kubeadm-master            1/1       Running             0          7m
[vagrant@kubeadm-master ~]$

@verizonold
Copy link
Author

verizonold commented Mar 23, 2018

@dougbtv Also, I see the following in journal n Master:

Mar 23 02:32:44 kubeadm-master kubelet[31957]: E0323 02:32:44.909766   31957 kubelet.go:2120] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Mar 23 02:32:49 kubeadm-master kubelet[31957]: W0323 02:32:49.913285   31957 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d

@verizonold
Copy link
Author

verizonold commented Mar 23, 2018

@dougbtv Also, this journal on minion:

Mar 23 02:42:28 kubeadm-minion-1 kubelet[20093]: W0323 02:42:28.965061   20093 pod_container_deletor.go:77] Container "f8c02f62918ef67197dfcbd8ca344c7bef190673f6ad1c612964a6c8b13bb060" not found in pod's containers
Mar 23 02:42:29 kubeadm-minion-1 kubelet[20093]: E0323 02:42:29.435318   20093 cni.go:259] Error adding network: Multus: error in invoke Delegate add - "flannel": open /run/flannel/subnet.env: no such file or directory
Mar 23 02:42:29 kubeadm-minion-1 kubelet[20093]: E0323 02:42:29.435339   20093 cni.go:227] Error while adding to cni network: Multus: error in invoke Delegate add - "flannel": open /run/flannel/subnet.env: no such file or directory
Mar 23 02:42:29 kubeadm-minion-1 kubelet[20093]: E0323 02:42:29.462835   20093 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kube-dns-6f4fd4bdf-frh8w_kube-system" network: Multus: error in invoke Delegate add - "flannel": open /run/flannel/subnet.env: no such file or directory
Mar 23 02:42:29 kubeadm-minion-1 kubelet[20093]: E0323 02:42:29.462862   20093 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "kube-dns-6f4fd4bdf-frh8w_kube-system(8416b366-2e36-11e8-9a10-525400daa710)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kube-dns-6f4fd4bdf-frh8w_kube-system" network: Multus: error in invoke Delegate add - "flannel": open /run/flannel/subnet.env: no such file or directory
Mar 23 02:42:29 kubeadm-minion-1 kubelet[20093]: E0323 02:42:29.462871   20093 kuberuntime_manager.go:647] createPodSandbox for pod "kube-dns-6f4fd4bdf-frh8w_kube-system(8416b366-2e36-11e8-9a10-525400daa710)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kube-dns-6f4fd4bdf-frh8w_kube-system" network: Multus: error in invoke Delegate add - "flannel": open /run/flannel/subnet.env: no such file or directory
Mar 23 02:42:29 kubeadm-minion-1 kubelet[20093]: E0323 02:42:29.462906   20093 pod_workers.go:186] Error syncing pod 8416b366-2e36-11e8-9a10-525400daa710 ("kube-dns-6f4fd4bdf-frh8w_kube-system(8416b366-2e36-11e8-9a10-525400daa710)"), skipping: failed to "CreatePodSandbox" for "kube-dns-6f4fd4bdf-frh8w_kube-system(8416b366-2e36-11e8-9a10-525400daa710)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-6f4fd4bdf-frh8w_kube-system(8416b366-2e36-11e8-9a10-525400daa710)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"kube-dns-6f4fd4bdf-frh8w_kube-system\" network: Multus: error in invoke Delegate add - \"flannel\": open /run/flannel/subnet.env: no such file or directory"
Mar 23 02:42:29 kubeadm-minion-1 kubelet[20093]: W0323 02:42:29.994040   20093 pod_container_deletor.go:77] Container "8d74bdf3632eaf46156454a65fb37f48d26b4570cb69834e277c57a581dd475f" not found in pod's containers

@dougbtv
Copy link
Member

dougbtv commented Mar 23, 2018

Cool -- at least you got the kubelet running. You're getting closer! There's also a part of me that wants to see if you can replicate it on subsequent runs with fresh machines -- in part because I'd like to fix it so it'll work every time you kick it off.

@verizonold -- can you show me the output of cat /etc/multus.yaml -- it looks like something went wrong when it kicked off that pod, and also I'd like to see kubectl get nodes --show-labels

Something else that also concerns me is that the there's only one instance of kube-multus-ds-* pods running, in theory, there should be one on the master and one on each node.

A quick thing to try too is to kubectl delete -f /etc/multus.yaml and then give it a few minutes to clear out and then kubectl create -f /etc/multus.yaml to see if re-creating it works out.

@verizonold
Copy link
Author

verizonold commented Mar 23, 2018

@dougbtv I got what seems like a working deployment. However, I log into the POD and cannot ping either the minion or the master. Nor can I ping the multus interface from master or minion node. Can you point out any items that I can possibly debug to set this right?

[centos@kubeadm-master ~]$ !242
kubectl get po --all-namespaces
NAMESPACE     NAME                                     READY     STATUS    RESTARTS   AGE
default       patnginx                                 1/1       Running   0          2h
default       patnginx2                                1/1       Running   0          1h
default       patnginx3                                1/1       Running   0          49m
kube-system   etcd-kubeadm-master                      1/1       Running   0          7h
kube-system   kube-apiserver-kubeadm-master            1/1       Running   0          7h
kube-system   kube-controller-manager-kubeadm-master   1/1       Running   0          7h
kube-system   kube-dns-6f4fd4bdf-q68tg                 3/3       Running   0          7h
kube-system   kube-multus-ds-88j6f                     1/1       Running   0          7h
kube-system   kube-multus-ds-8qzcp                     1/1       Running   0          7h
kube-system   kube-proxy-cwkxn                         1/1       Running   0          7h
kube-system   kube-proxy-t7sd7                         1/1       Running   0          7h
kube-system   kube-scheduler-kubeadm-master            1/1       Running   0          7h
[centos@kubeadm-master ~]$

@leifmadsen leifmadsen changed the title added node with multus not registering with master [help] added node with multus not registering with master Apr 4, 2018
@verizonold
Copy link
Author

verizonold commented Apr 4, 2018

@dougbtv this is related to tracking the problems with the build (enclosed last):

  • I am using virtual box and set up 3 VM's - one for ansible, one for master and one for minion
  • Created the centos user on all three and copied over the ssh keys from ansible host to master/minion
  • executed: ansible-galaxy install -r requirements.yml
  • executed: ansible-playbook -i ./inventory/companyy.inventory -e "@./inventory/companyy-extra.yml" playbooks/kube-install.yml

companyy-extra.yml

pod_network_type: "multus"
multus_use_crd: false
optional_packages:
  - tcpdump
  - bind-utils
multus_ipam_subnet: "10.0.0.0/24"
multus_ipam_rangeStart: "10.0.0.200"
multus_ipam_rangeEnd: "10.0.0.216"
multus_ipam_gateway: "10.0.0.1"

companyy.inventory

kube-master ansible_host=10.0.0.107
kube-minion-1 ansible_host=10.0.0.141
kube-minion-2 ansible_host=10.0.0.133
[master]
kube-master
[nodes]
kube-minion-1
kube-minion-2
[all_vms]
kube-master
kube-minion-1
kube-minion-2
[all_vms:vars]
ansible_ssh_user=centos
[all:vars]
ansible_user=centos
[all]
kube-master
kube-minion-1
kube-minion-2

Build error

FAILED - RETRYING: Wait until kubectl get pods is ready (8 retries left).
FAILED - RETRYING: Wait until kubectl get pods is ready (7 retries left).
FAILED - RETRYING: Wait until kubectl get pods is ready (6 retries left).
FAILED - RETRYING: Wait until kubectl get pods is ready (5 retries left).
FAILED - RETRYING: Wait until kubectl get pods is ready (4 retries left).
FAILED - RETRYING: Wait until kubectl get pods is ready (3 retries left).
FAILED - RETRYING: Wait until kubectl get pods is ready (2 retries left).
FAILED - RETRYING: Wait until kubectl get pods is ready (1 retries left).
fatal: [kube-master]: FAILED! => {"attempts": 60, "changed": true, "cmd": "kubect
l get pods --all-namespaces", "delta": "0:00:00.073250", "end": "2018-03-27 23:37
:27.979183", "msg": "non-zero return code", "rc": 1, "start": "2018-03-27 23:37:2
7.905933", "stderr": "The connection to the server localhost:8080 was refused - d
id you specify the right host or port?", "stderr_lines": ["The connection to the
server localhost:8080 was refused - did you specify the right host or port?"], "s
tdout": "", "stdout_lines": []}
...ignoring

TASK [kube-cni : Configure bridge cni plugin] ***********************************


TASK [kube-cni : Configure bridge cni plugin] ***********************************


TASK [kube-cni : Apply the flannel RBAC] ****************************************

fatal: [kube-master]: FAILED! => {"changed": true, "cmd": "kubectl create -f /etc
/flannel-rbac.yaml", "delta": "0:00:00.073983", "end": "2018-03-27 23:37:28.47698
4", "msg": "non-zero return code", "rc": 1, "start": "2018-03-27 23:37:28.403001"
, "stderr": "The connection to the server localhost:8080 was refused - did you sp
ecify the right host or port?", "stderr_lines": ["The connection to the server lo
calhost:8080 was refused - did you specify the right host or port?"], "stdout": "
", "stdout_lines": []}

PLAY RECAP **********************************************************************

kube-master                : ok=57   changed=39   unreachable=0    failed=1
kube-minion-1              : ok=22   changed=2    unreachable=0    failed=1

@dougbtv
Copy link
Member

dougbtv commented Apr 5, 2018

verizonold reported that they had a successful run with some fresh virtual machines. maybe something user-introduced that caused the failure. looks like a problem with the centos user's ability to run kubectl.

@verizonold
Copy link
Author

@dougbtv great help with this. I have a working deployment for now. Closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants