Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 'Join to Kubernetes cluster' task #2805

Merged
merged 5 commits into from
Dec 31, 2021

Conversation

atsikham
Copy link
Contributor

@atsikham atsikham commented Dec 20, 2021

Copying is handled by Ansible's 'copy' module. From documentation:

If yes, the remote file will be replaced when contents are different than the source.

So no checksums are checked, only condition removed.

@atsikham atsikham marked this pull request as ready for review December 20, 2021 23:23
seriva
seriva previously approved these changes Dec 21, 2021
cicharka
cicharka previously approved these changes Dec 21, 2021
@atsikham atsikham dismissed stale reviews from cicharka and seriva via a0ec053 December 28, 2021 13:25
seriva
seriva previously approved these changes Dec 28, 2021
to-bar
to-bar previously approved these changes Dec 28, 2021
@przemyslavic
Copy link
Collaborator

/azp run

@przemyslavic
Copy link
Collaborator

@atsikham
It doesn't solve the issue described in the task #1175.

  1. Deployed k8s cluster (1 master + 1 node)
  2. Executed kubeadm reset on master node
  3. Ran epicli apply again and it failed:
2021-12-29T11:27:03.6324390Z[38;21m11:27:03 INFO cli.engine.ansible.AnsibleCommand - TASK [kubernetes_node : Join to Kubernetes cluster] ****************************
2021-12-29T11:27:04.6504789Z[31;21m11:27:04 ERROR cli.engine.ansible.AnsibleCommand - fatal: [ci-pkiazurcentcanal-kubernetes-node-vm-0]: FAILED! => {"changed": true, "cmd": "kubeadm join  --config /etc/kubeadm/kubeadm-join-node.yml\n", "delta": "0:00:00.392559", "end": "2021-12-29 11:27:04.577311", "msg": "non-zero return code", "rc": 1, "start": "2021-12-29 11:27:04.184752", "stderr": "error execution phase preflight: [preflight] Some fatal errors occurred:\n\t[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists\n\t[ERROR Port-10250]: Port 10250 is in use\n\t[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists\n[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": ["error execution phase preflight: [preflight] Some fatal errors occurred:", "\t[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists", "\t[ERROR Port-10250]: Port 10250 is in use", "\t[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists", "[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "[preflight] Running pre-flight checks", "stdout_lines": ["[preflight] Running pre-flight checks"]}
2021-12-29T11:27:04.6649538Z[38;21m11:27:04 INFO cli.engine.ansible.AnsibleCommand - 
2021-12-29T11:27:04.6650931Z[38;21m11:27:04 INFO cli.engine.ansible.AnsibleCommand - TASK [kubernetes_node : Join to cluster with ignores] **************************
2021-12-29T11:29:05.9094641Z[31;21m11:29:05 ERROR cli.engine.ansible.AnsibleCommand - fatal: [ci-pkiazurcentcanal-kubernetes-node-vm-0]: FAILED! => {"changed": true, "cmd": "kubeadm join  --config /etc/kubeadm/kubeadm-join-node.yml  --ignore-preflight-errors all\n", "delta": "0:02:00.670749", "end": "2021-12-29 11:29:05.816895", "msg": "non-zero return code", "rc": 1, "start": "2021-12-29 11:27:05.146146", "stderr": "\t[WARNING FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists\n\t[WARNING Port-10250]: Port 10250 is in use\n\t[WARNING FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists\nerror execution phase kubelet-start: error uploading crisocket: timed out waiting for the condition\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": ["\t[WARNING FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists", "\t[WARNING Port-10250]: Port 10250 is in use", "\t[WARNING FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists", "error execution phase kubelet-start: error uploading crisocket: timed out waiting for the condition", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "[preflight] Running pre-flight checks\n[preflight] Reading configuration from the cluster...\n[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'\n[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"\n[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"\n[kubelet-start] Starting the kubelet\n[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...\n[kubelet-check] Initial timeout of 40s passed.", "stdout_lines": ["[preflight] Running pre-flight checks", "[preflight] Reading configuration from the cluster...", "[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'", "[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"", "[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"", "[kubelet-start] Starting the kubelet", "[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...", "[kubelet-check] Initial timeout of 40s passed."]}
2021-12-29T11:29:05.9296216Z[38;21m11:29:05 INFO cli.engine.ansible.AnsibleCommand - 
2021-12-29T11:29:05.9297310Z[38;21m11:29:05 INFO cli.engine.ansible.AnsibleCommand - TASK [kubernetes_node : Display kubeadm join stderr if any] ********************
2021-12-29T11:29:05.9885416Z[38;21m11:29:05 INFO cli.engine.ansible.AnsibleCommand - ok: [ci-pkiazurcentcanal-kubernetes-node-vm-0] => {
2021-12-29T11:29:05.9887711Z[31;21m11:29:05 ERROR cli.engine.ansible.AnsibleCommand -     "msg": "Joined with warnings\n['\\t[WARNING FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists', '\\t[WARNING Port-10250]: Port 10250 is in use', '\\t[WARNING FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists', 'error execution phase kubelet-start: error uploading crisocket: timed out waiting for the condition', 'To see the stack trace of this error execute with --v=5 or higher']\n"
2021-12-29T11:29:05.9897007Z[38;21m11:29:05 INFO cli.engine.ansible.AnsibleCommand - }
2021-12-29T11:29:06.0065788Z[38;21m11:29:06 INFO cli.engine.ansible.AnsibleCommand - 
2021-12-29T11:29:06.0066806Z[38;21m11:29:06 INFO cli.engine.ansible.AnsibleCommand - TASK [kubernetes_node : Mark node regardless of join result] *******************
2021-12-29T11:29:06.0535320Z[38;21m11:29:06 INFO cli.engine.ansible.AnsibleCommand - ok: [ci-pkiazurcentcanal-kubernetes-node-vm-0]
2021-12-29T11:29:06.0564255Z[38;21m11:29:06 INFO cli.engine.ansible.AnsibleCommand - 
2021-12-29T11:29:06.0572498Z[38;21m11:29:06 INFO cli.engine.ansible.AnsibleCommand - PLAY RECAP *********************************************************************
2021-12-29T11:29:06.0579407Z[38;21m11:29:06 INFO cli.engine.ansible.AnsibleCommand - ci-pkiazurcentcanal-kubernetes-master-vm-0 : ok=9    changed=0    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0   
2021-12-29T11:29:06.0584014Z[38;21m11:29:06 INFO cli.engine.ansible.AnsibleCommand - ci-pkiazurcentcanal-kubernetes-node-vm-0 : ok=37   changed=1    unreachable=0    failed=1    skipped=12   rescued=1    ignored=0   
2021-12-29T11:29:06.0587266Z[38;21m11:29:06 INFO cli.engine.ansible.AnsibleCommand - ci-pkiazurcentcanal-repository-vm-0 : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

@przemyslavic
Copy link
Collaborator

przemyslavic commented Dec 29, 2021

✔️ Executed kubeadm reset on each node and it succeeded.
The question is, do we want the reset of each node to be somehow automated when something broke, or users should be responsible for it before running apply again.

Also task #2669 seems to be related.

@atsikham atsikham dismissed stale reviews from to-bar and seriva via 49fa803 December 30, 2021 10:19
@przemyslavic
Copy link
Collaborator

/azp run

@przemyslavic
Copy link
Collaborator

przemyslavic commented Dec 30, 2021

Tested scenario:

  1. Deployed k8s cluster: 1 master + 1 node
  2. Ran kubeadm reset on master node
  3. Executed epicli apply again and it succeedeed
    Tested on AWS Ubuntu/RHEL/CentOS

@atsikham atsikham mentioned this pull request Dec 31, 2021
10 tasks
@atsikham atsikham requested review from to-bar and removed request for cicharka and to-bar December 31, 2021 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants