-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RKE2 not starting up services on RHEL8 #1539
Comments
Why are you installing and starting Docker before RKE2? RKE2 uses its own embedded containerd; there is no need to install docker beforehand and in fact you are better off not. It also looks like you've not installed the required selinux packages; normally the installer does this for you so I'm confused how this could happen: If removing docker and installing the selinux packages does not resolve the error, see if there's anything interesting in the containerd log at |
@brandond thats good to know. I was just expecting it, as it was a requirement for rke (binary) and rke in rancher, as far as I know. systemctl disable firewalld Still not working, seems to be like before. Could it be that the issue is the RHEL image in Azure? It seems to be locked to 8.1 and cant/shouldnt be updated. |
etcd still isn't starting, can you check the containerd log file as requested above? |
something seems to be off for CNI. @brandond have you ever experienced something similar? |
time="2021-08-06T06:36:38.554910732Z" level=info msg="CreateContainer within sandbox \"214255c37689a276a424b37dffd2b03b9f7c641b68045f2180b2413dd5275d51\" for &ContainerMetadata{Name:kube-proxy,Attempt:0,} returns container id \"84b2f6932660193fa18d0a555b43e1b4444592ff7b6898aaf9e5c86f01cc05bd\""
time="2021-08-06T06:36:38.555421235Z" level=info msg="StartContainer for \"84b2f6932660193fa18d0a555b43e1b4444592ff7b6898aaf9e5c86f01cc05bd\""
time="2021-08-06T06:36:38.722580775Z" level=info msg="StartContainer for \"84b2f6932660193fa18d0a555b43e1b4444592ff7b6898aaf9e5c86f01cc05bd\" returns successfully"
time="2021-08-06T06:36:52.714149951Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:etcd-adbsg-fzag-k8s-vm1,Uid:985840a449fab27fbd1831f57843061a,Namespace:kube-system,Attempt:0,}"
time="2021-08-06T06:36:52.792435138Z" level=info msg="starting signal loop" namespace=k8s.io path=/run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/e26f8170d80ada8ab2531a87962fec1edbfd7b708c416ff914d2c5e1d6cf662e pid=6507
time="2021-08-06T06:36:52.908778362Z" level=info msg="shim disconnected" id=e26f8170d80ada8ab2531a87962fec1edbfd7b708c416ff914d2c5e1d6cf662e
time="2021-08-06T06:36:52.908851662Z" level=error msg="copy shim log" error="read /proc/self/fd/28: file already closed"
time="2021-08-06T06:36:52.943644879Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:etcd-adbsg-fzag-k8s-vm1,Uid:985840a449fab27fbd1831f57843061a,Namespace:kube-system,Attempt:0,} failed, error" error="failed to create containerd task: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: failed to set /proc/self/attr/keycreate on procfs: write /proc/self/attr/keycreate: invalid argument: unknown" I've only seen this one other time, in #851 (comment) - and this user was setting a custom-data dir and also using a standalone containerd, both of which complicate things on selinux-enabled hosts. Can you start over on a clean host that doesn't have system docker or containerd installed, and ensure that you're using the RPM install with all the required selinux packages? |
I did start again from a clean RHEL 8.1 VM in Azure. |
Hi @brandond , I'm able to consistently reproduce this issue on airgap setup with HTTP proxy + internal registry enabled node.
One thing I noticed during installation is, below error appeared on the rpm post-installation step.
|
There is something not right in the SElinux policy file.
When I tried to load the module manually, I get the same error
|
Here is the line that causing the issue
|
Didn't see any such types in the loaded container module. :(
|
After upgrading the RPM to
Rebooted the node and then all system containers came up. Cc: @brandond |
Yes, upstream pulled in some... ill advised... updates to the container-selinux policy that we have had to work around: cc @dweomer |
Hmm, should be compatible with 2.159.x |
Do we need to mention this in the support matrix or rke2 docs? |
Hi all, i'm also facing same issue while deploying rke2 using ansible on RHEL 8.2, does anyone get some solution for that? |
I am having the same issue on RedHat 8.4 on AWS, it looks like the error is: sudo systemctl status rke2-server
● rke2-server.service - Rancher Kubernetes Engine v2 (server)
Loaded: loaded (/usr/lib/systemd/system/rke2-server.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Thu 2022-03-03 15:36:37 UTC; 728ms ago
Docs: https://github.com/rancher/rke2#readme
Process: 17941 ExecStopPost=/bin/sh -c systemd-cgls /system.slice/rke2-server.service | grep -Eo '[0-9]+ (containerd|kubelet)' | awk '{print $1}' | xargs -r kill (code=exited, status=0/SUCCESS)
Process: 17938 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service (code=exited, status=1/FAILURE) It looks like what is failing is the command before the execution of rke2:
causing the service's executable to not run ? On the same node, just running Steps to reproduce:
[ec2-user@ip-172-31-44-245 ~]$ curl -sfL https://get.rke2.io | sudo sh -
[ec2-user@ip-172-31-44-245 ~]$ sudo systemctl enable rke2-server
[ec2-user@ip-172-31-44-245 ~]$ sudo systemctl start rke2-server Using this user-data should result in same behavior: #cloud-config
runcmd:
- curl -sfL https://get.rke2.io | sudo sh -
- sudo systemctl enable rke2-server
- sudo systemctl start rke2-server |
@belgaied2 Not the same issue we discussed in the original thread. Looks like you need to make sure the known issues are addressed as per the rke2 doc.
|
Thanks for the clarification! |
This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions. |
Environmental Info:
RKE2 Version:
[myuser@vm1 ~]$ rke2 -v
rke2 version v1.21.3+rke2r1 (2ed0b0d)
go version go1.16.6b7
Node(s) CPU architecture, OS, and Version:
Linux vm1 4.18.0-147.51.2.el8_1.x86_64 #1 SMP Thu Jul 8 06:09:25 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration:
1 VM, Red Hat Enterprise Linux 8, latest patch level, 2
vCPU, 8GB RAM
/etc/rancher/rke2/config.yaml:
debug: true
selinux: true
Describe the bug:
rke2-server does not seem to be able to start its components.
Steps To Reproduce:
Added some iptable entries
sudo iptables -A INPUT -p tcp --dport 6443 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 2379 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 2380 -j ACCEPT
Installed Docker
sudo dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
sudo dnf install docker-ce --nobest -y
sudo systemctl start docker
sudo systemctl enable docker
sudo usermod -aG docker $USER
Installed and started RKE2 server
Run the installer: curl -sfL https://get.rke2.io | sh -
Enable the rke2-server service: systemctl enable rke2-server.service
Start the service: systemctl start rke2-server.service
Expected behavior:
RKE2 server to startup components like etcd, kube-apiserver
Actual behavior:
connection errors in log (see attached log file) for 2379 (etcd) and 6443 (kube-apiserver)
Additional context / logs:
log-rke2 - Copy.txt
The text was updated successfully, but these errors were encountered: