-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for dummy kernel module #7348
Check for dummy kernel module #7348
Conversation
Hi @lentzi90. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I see that the CI failed on the test I added. Does anyone know if the cluster that is set up in that test "actually works" or if it is just checking that "kubespray works"? |
On another PR which passes the job packet_ubuntu20-calico-aio https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/-/jobs/1075009454 the job checks network connectivity between 2 pods with ping command as https://github.com/kubernetes-sigs/kubespray/blob/master/tests/testcases/030_check-network.yml#L134 /cc @oomichi |
Ubuntu 20 kvm kernel doesn't have dummy module afaik |
Thanks for the comments! Ok so ping tests work without the dummy module. To me the most obviously broken thing was node-local-dns. Is this used in the pipeline? I saw that there is a test for making sure pods are running so if node-local-dns is used I guess this test should catch it if it is broken. But I didn't manage to figure out what |
https://github.com/kubernetes-sigs/kubespray/blob/master/tests/files/packet_ubuntu20-calico-aio.yml#L14 |
Great thanks! Do you think it would make sense to check for the dummy module if nodelocaldns is used? |
Yes |
The dummy module is needed for nodelocaldns.
9c0bb77
to
60f43aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ok-to-test
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: floryut, lentzi90 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
The dummy module is needed for nodelocaldns. (cherry picked from commit 5a54db2)
The dummy module is needed for nodelocaldns. (cherry picked from commit 5a54db2)
The dummy module is needed for nodelocaldns.
* Only use stat get_checksum: yes when needed (kubernetes-sigs#7270) By default Ansible stat module compute checksum, list extended attributes and find mime type To find all stat invocations that really use one of those: git grep -F stat. | grep -vE 'stat.(islnk|exists|lnk_source|writeable)' Signed-off-by: Etienne Champetier <[email protected]> (cherry picked from commit de1d9df) Conflicts: roles/etcd/tasks/check_certs.yml * Add kube-ipvs0/nodelocaldns to NetworkManager unmanaged-devices (kubernetes-sigs#7315) On CentOS 8 they seem to be ignored by default, but better be extra safe This also make it easy to exclude other network plugin interfaces Signed-off-by: Etienne Champetier <[email protected]> (cherry picked from commit e442b1d) * Stop using kubeadm to update server in kubeconfigs (kubernetes-sigs#7338) Using `kubeadm init phase kubeconfig all` breaks kubelet client certificate rotation as we are missing `kubeadm init phase kubelet-finalize all` to point to `kubelet-client-current.pem` kubeconfig format is stable so let's just use lineinfile, this will avoid other future breakage This revert to the logic before 6fe2248 Signed-off-by: Etienne Champetier <[email protected]> (cherry picked from commit c9c0c01) * kubeadm-config.v1beta2.yaml.j2: etcd log level arg (kubernetes-sigs#7339) According to [etcd's docs](https://etcd.io/docs/v3.4.0/op-guide/configuration/#--log-package-levels), argument 'log-package-levels' should not contain underscores. (cherry picked from commit b7c2265) * Remove pre kubeadm cert migration tasks apiserver.pem is not used since ddffdb6 Signed-off-by: Etienne Champetier <[email protected]> (cherry picked from commit fedd671) Conflicts: roles/kubernetes/master/tasks/kubeadm-cleanup-old-certs.yml roles/kubernetes/master/tasks/kubeadm-migrate-certs.yml * Remove useless call to 'kubeadm version' Signed-off-by: Etienne Champetier <[email protected]> (cherry picked from commit a6e1f5e) * Remove admin.conf removal kubeadm is the default for a long time now, and admin.conf is created by it, so let kubeadm handle it Signed-off-by: Etienne Champetier <[email protected]> (cherry picked from commit 280036f) * Remove rotate_tokens logic kubeadm never rotates sa.key/sa.pub, so there is no need to delete tokens/restart pods Signed-off-by: Etienne Champetier <[email protected]> (cherry picked from commit 8800b5c) * Always backup both certs and kubeconfig There are no reasons not to backup during upgrade Signed-off-by: Etienne Champetier <[email protected]> (cherry picked from commit 53e5ef6) Conflicts: roles/kubernetes/master/tasks/kubeadm-backup.yml roles/kubernetes/master/tasks/kubeadm-certificate.yml * Delete misnammed kubeadm-version.yml The important action in kubeadm-version.yml is the templating of the configuration, not finding / setting the version Signed-off-by: Etienne Champetier <[email protected]> (cherry picked from commit a9c97e5) Conflicts: roles/kubernetes/master/tasks/kubeadm-version.yml * Add privileged_without_host_devices support (kubernetes-sigs#7343) When privileged is enabled for a container, all the `/dev/*` block devices from the host are mounted into the guest. The `privileged_without_host_devices` flag prevents host devices from being passed to privileged containers. More information: * containerd/cri#1225 * cri-o/cri-o@1d0f681 (cherry picked from commit dc5df57) * ansible and jinja2 updates (kubernetes-sigs#7357) * Update ansible to v2.9.18 Signed-off-by: Maciej Wereski <[email protected]> * Update jinja2 to v2.11.3 Signed-off-by: Maciej Wereski <[email protected]> (cherry picked from commit b07c596) * Fixup kubelet.conf to point to kubelet-client-current.pem (kubernetes-sigs#7347) c9c0c01 only fix the problem for new clusters Signed-off-by: Etienne Champetier <[email protected]> (cherry picked from commit 14b63ed) Conflicts: roles/kubernetes/master/tasks/kubelet-fix-client-cert-rotation.yml * Check for dummy kernel module (kubernetes-sigs#7348) The dummy module is needed for nodelocaldns. (cherry picked from commit 5a54db2) * Fixup one more missing kubespray-defaults (kubernetes-sigs#7375) "The error was: 'proxy_disable_env' is undefined\n\nThe error appears to be in '<censored>scale.yml': line 72, column 7" Fixes 067db68 Signed-off-by: Etienne Champetier <[email protected]> (cherry picked from commit 057e8b4) * Upgrade openSUSE Leap to 15.2 (kubernetes-sigs#7331) 15.1 has reached EOL on 2021-02-02. Signed-off-by: Maciej Wereski <[email protected]> (cherry picked from commit 69d11da) * Update kube-ovn to 1.6.0 (kubernetes-sigs#7240) (cherry picked from commit edc4bb4) * Minor update to cilium and calico (cherry picked from commit de46f86) * Update nodelocaldns to 1.17.1 (cherry picked from commit 5f2c8ac) * Download Calico KDD CRDs (kubernetes-sigs#7372) * Download Calico KDD CRDs * Replace kustomize with lineinfile and use ansible assemble module * Replace find+lineinfile by sed in shell module to avoid nested loop * add condition on sed * use block for kdd tasks + remove supernumerary kdd manifest apply in start "Start Calico resources" (cherry picked from commit 1c62af0) Conflicts: roles/network_plugin/calico/tasks/install.yml * Update CNI (calico, kubeovn, multus) and Helm (cherry picked from commit 05f132c) * Fix calico crds missing 3.16.9 (kubernetes-sigs#7386) (cherry picked from commit ead8a4e) * Update hashes for 1.20.5/1.19.9/1.18.17 (cherry picked from commit 6d3dbb4) * Set K8S default to v1.19.9 Signed-off-by: Etienne Champetier <[email protected]> * Auto renew control plane certificates (kubernetes-sigs#7358) While at it remove force_certificate_regeneration This boolean only forced the renewal of the apiserver certs Either manually use k8s-certs-renew.sh or set auto_renew_certificates Signed-off-by: Etienne Champetier <[email protected]> (cherry picked from commit efa1803) Conflicts: roles/kubernetes/master/templates/k8s-certs-renew.service.j2 roles/kubernetes/master/templates/k8s-certs-renew.sh.j2 roles/kubernetes/master/templates/k8s-certs-renew.timer.j2 * Add cryptography installation (kubernetes-sigs#7404) To avoid ModuleNotFoundError due to no module named 'setuptools_rust', this adds cryptography installation to requirements.txt. Created by jfc-evs originally as kubernetes-sigs#7264 (cherry picked from commit 49abf60) * Allow connecting to bastion via non-standard SSH port (kubernetes-sigs#7396) * Allow connecting to bastion via non-standard port * Fix bastion connection when ansible_port is not provided (cherry picked from commit 6fa3565) * Correct Jinja Syntax for etcd-unsupported-arch (kubernetes-sigs#6919) `-%` causes `etcd-unsupported-arch: arm64` to print on COL 1 instead of COL 6. Signed-off-by: anthr76 <[email protected]> (cherry picked from commit edfa3e9) * Fix k8s-certs-renew for k8s < 1.20 (kubernetes-sigs#7410) Signed-off-by: Etienne Champetier <[email protected]> (cherry picked from commit 2d1597b) * Remove ignore_errors from drain tasks and enable retires (kubernetes-sigs#7151) * Remove ignore_errors from drain tasks and enable retires * Fix lint error by checking if stdout length is not 0, ie string is not empty. (cherry picked from commit ccd3aee) * Fix remove-node by removing jq usage (kubernetes-sigs#7405) Signed-off-by: Etienne Champetier <[email protected]> (cherry picked from commit 36a3a78) * Remove left over nodes_to_drain Signed-off-by: Etienne Champetier <[email protected]> * remove local lb privileged (kubernetes-sigs#7437) (kubernetes-sigs#7454) Co-authored-by: Samuel Liu <[email protected]> * Add new kubernetes hashes (1.19.10, 1.20.6) * Default to latest kubernetes patch version (1.19.10) * Update k8s-certs-renew.sh.j2 (kubernetes-sigs#7422) fix undefinedElse (cherry picked from commit cce9d31) * reset roles need flush iptables:raw (kubernetes-sigs#7426) (cherry picked from commit 7f52c1d) * Remove calico-rr from local inventory hosts file (kubernetes-sigs#7439) (cherry picked from commit 596d028) Conflicts: inventory/local/hosts.ini * Replace deprecated 'with_dict' with 'loop' (kubernetes-sigs#7442) (cherry picked from commit 6479e26) * local provisioner 'useNodeNameOnly' option can be configured (kubernetes-sigs#7421) (cherry picked from commit 7e75d48) * fix scale (kubernetes-sigs#7449) (cherry picked from commit 7340a16) * remove-node roles: fix kubectl absolute path (kubernetes-sigs#7469) * kubelet absolute path * kubelet absolute path (cherry picked from commit e2a7f3e) * add CI test for auto_renew_certificates (kubernetes-sigs#7472) * add CI test for auto_renew_certificates * change timer value fix typo error in rotate cert script (cherry picked from commit cce0940) Conflicts: roles/kubernetes/master/templates/k8s-certs-renew.timer.j2 * Remove dead code from kubeadm-etcd (kubernetes-sigs#7470) (cherry picked from commit aa086e5) * format ansible output (kubernetes-sigs#7482) (cherry picked from commit 90c643f) * Regenerate apiserver.crt on all control-plane nodes (kubernetes-sigs#7463) We were regenerating only the cert of the first node While at it speed up the check step Signed-off-by: Etienne Champetier <[email protected]> (cherry picked from commit e444b3c) Conflicts: roles/kubernetes/master/tasks/kubeadm-setup.yml * Add auto_renew_certificates_systemd_calendar (kubernetes-sigs#7490) This allow to configure when K8S certificates renewal runs Signed-off-by: Etienne Champetier <[email protected]> (cherry picked from commit bf6a39e) Conflicts: inventory/sample/group_vars/k8s-cluster/k8s-cluster.yml roles/kubernetes/master/defaults/main/main.yml roles/kubernetes/master/templates/k8s-certs-renew.timer.j2 * Check if python netaddr and recent enough jinja are installed (kubernetes-sigs#7486) CentOS 7 provides up to date Ansible with really old jinja version Signed-off-by: Etienne Champetier <[email protected]> (cherry picked from commit 332cc1c) * Add missing proxy environment in crio_repo.yml (kubernetes-sigs#7492) (cherry picked from commit 2a2fb68) Co-authored-by: Etienne Champetier <[email protected]> Co-authored-by: Du9L.com <[email protected]> Co-authored-by: Victor Morales <[email protected]> Co-authored-by: Maciej <[email protected]> Co-authored-by: Lennart Jern <[email protected]> Co-authored-by: Florian Ruynat <[email protected]> Co-authored-by: Erwan Miran <[email protected]> Co-authored-by: Kenichi Omichi <[email protected]> Co-authored-by: Kaleb Elwert <[email protected]> Co-authored-by: Anthony Rabbito <[email protected]> Co-authored-by: David Louks <[email protected]> Co-authored-by: bleech1 <[email protected]> Co-authored-by: Samuel Liu <[email protected]> Co-authored-by: Fredrik Liv <[email protected]> Co-authored-by: Helmut Januschka <[email protected]> Co-authored-by: Maxime Lavandier <[email protected]> Co-authored-by: orange-llajeanne <[email protected]> Co-authored-by: Sergey <[email protected]> Co-authored-by: Krystian Młynek <[email protected]>
What type of PR is this?
/kind feature
What this PR does / why we need it:
The dummy kernel module seems to be required at least for node-local-dns, but there is no check to detect if this module is missing. In fact, I found that kubespray runs just fine without it but you get weird network problems in the cluster. For example, the node-local-dns will crash loop since it cannot create any dummy interfaces.
This adds a simple test to abort the installation if the module is not present.
Which issue(s) this PR fixes:
Fixes #7307
Special notes for your reviewer:
I'm not that well versed in kernel modules and networking, so there could well be better ways to do this that I just don't know about. Please let me know in that case and I'll try to address it.
Does this PR introduce a user-facing change?: