RKE2 not starting up services on RHEL8 #1539

jcrosel · 2021-08-05T13:29:39Z

Environmental Info:
RKE2 Version:
[myuser@vm1 ~]$ rke2 -v
rke2 version v1.21.3+rke2r1 (2ed0b0d)
go version go1.16.6b7

Node(s) CPU architecture, OS, and Version:
Linux vm1 4.18.0-147.51.2.el8_1.x86_64 #1 SMP Thu Jul 8 06:09:25 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
1 VM, Red Hat Enterprise Linux 8, latest patch level, 2
vCPU, 8GB RAM

/etc/rancher/rke2/config.yaml:
debug: true
selinux: true

Describe the bug:
rke2-server does not seem to be able to start its components.

Steps To Reproduce:

Added some iptable entries
sudo iptables -A INPUT -p tcp --dport 6443 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 2379 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 2380 -j ACCEPT

Installed Docker
sudo dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
sudo dnf install docker-ce --nobest -y
sudo systemctl start docker
sudo systemctl enable docker
sudo usermod -aG docker $USER

Installed and started RKE2 server
Run the installer: curl -sfL https://get.rke2.io | sh -
Enable the rke2-server service: systemctl enable rke2-server.service
Start the service: systemctl start rke2-server.service

Expected behavior:
RKE2 server to startup components like etcd, kube-apiserver

Actual behavior:
connection errors in log (see attached log file) for 2379 (etcd) and 6443 (kube-apiserver)

Additional context / logs:
log-rke2 - Copy.txt

The text was updated successfully, but these errors were encountered:

brandond · 2021-08-05T21:39:29Z

Why are you installing and starting Docker before RKE2? RKE2 uses its own embedded containerd; there is no need to install docker beforehand and in fact you are better off not.

It also looks like you've not installed the required selinux packages; normally the installer does this for you so I'm confused how this could happen:
Aug 05 13:15:38 vm1 rke2[15268]: time="2021-08-05T13:15:38Z" level=warning msg="SELinux is enabled for rke2 but process is not running in context 'container_runtime_t', rke2-selinux policy may need to be applied"

If removing docker and installing the selinux packages does not resolve the error, see if there's anything interesting in the containerd log at /var/lib/rancher/rke2/agent/containerd/containerd.log.

jcrosel · 2021-08-06T06:50:08Z

@brandond thats good to know. I was just expecting it, as it was a requirement for rke (binary) and rke in rancher, as far as I know.
I now completly started from scratch, these are the steps I did:
rke2-server.log

systemctl disable firewalld
vim /etc/NetworkManager/conf.d/rke2-canal.conf
systemctl reload NetworkManager
dnf upgrade -y
reboot
curl -sfL https://get.rke2.io | sh -
systemctl enable rke2-server.service
systemctl start rke2-server.service

Still not working, seems to be like before.
Log file attached:
rke2-server.log

Could it be that the issue is the RHEL image in Azure? It seems to be locked to 8.1 and cant/shouldnt be updated.
Thats the latest version Red Hat provides in Azure market place.

brandond · 2021-08-06T16:25:35Z

etcd still isn't starting, can you check the containerd log file as requested above?

jcrosel · 2021-08-09T08:42:19Z

something seems to be off for CNI.
containerd.log

@brandond have you ever experienced something similar?

brandond · 2021-08-09T16:23:09Z

time="2021-08-06T06:36:38.554910732Z" level=info msg="CreateContainer within sandbox \"214255c37689a276a424b37dffd2b03b9f7c641b68045f2180b2413dd5275d51\" for &ContainerMetadata{Name:kube-proxy,Attempt:0,} returns container id \"84b2f6932660193fa18d0a555b43e1b4444592ff7b6898aaf9e5c86f01cc05bd\""
time="2021-08-06T06:36:38.555421235Z" level=info msg="StartContainer for \"84b2f6932660193fa18d0a555b43e1b4444592ff7b6898aaf9e5c86f01cc05bd\""
time="2021-08-06T06:36:38.722580775Z" level=info msg="StartContainer for \"84b2f6932660193fa18d0a555b43e1b4444592ff7b6898aaf9e5c86f01cc05bd\" returns successfully"
time="2021-08-06T06:36:52.714149951Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:etcd-adbsg-fzag-k8s-vm1,Uid:985840a449fab27fbd1831f57843061a,Namespace:kube-system,Attempt:0,}"
time="2021-08-06T06:36:52.792435138Z" level=info msg="starting signal loop" namespace=k8s.io path=/run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/e26f8170d80ada8ab2531a87962fec1edbfd7b708c416ff914d2c5e1d6cf662e pid=6507
time="2021-08-06T06:36:52.908778362Z" level=info msg="shim disconnected" id=e26f8170d80ada8ab2531a87962fec1edbfd7b708c416ff914d2c5e1d6cf662e
time="2021-08-06T06:36:52.908851662Z" level=error msg="copy shim log" error="read /proc/self/fd/28: file already closed"
time="2021-08-06T06:36:52.943644879Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:etcd-adbsg-fzag-k8s-vm1,Uid:985840a449fab27fbd1831f57843061a,Namespace:kube-system,Attempt:0,} failed, error" error="failed to create containerd task: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: failed to set /proc/self/attr/keycreate on procfs: write /proc/self/attr/keycreate: invalid argument: unknown"

I've only seen this one other time, in #851 (comment) - and this user was setting a custom-data dir and also using a standalone containerd, both of which complicate things on selinux-enabled hosts. Can you start over on a clean host that doesn't have system docker or containerd installed, and ensure that you're using the RPM install with all the required selinux packages?

jcrosel · 2021-08-12T06:41:01Z

I did start again from a clean RHEL 8.1 VM in Azure.
I used the script in the quick start guide to install it. That uses rpm as well, right?

ansilh · 2021-10-08T11:41:07Z

Hi @brandond , I'm able to consistently reproduce this issue on airgap setup with HTTP proxy + internal registry enabled node.

RHEL 8.2 - 4.18.0-193.el8.x86_64
SELinux - enforcing
RPMs

# rpm -qa |grep rke2
rke2-selinux-0.8-2.el8.noarch
rke2-server-1.20.10~rke2r1-0.el8.x86_64
rke2-common-1.20.10~rke2r1-0.el8.x86_64

YUM proxy config

# grep proxy /etc/yum.conf
proxy=http://squid.ansil.io:3128

Registry mirror config

# cat /etc/rancher/rke2/registries.yaml
mirrors:
  docker.io:
    endpoint:
      - "https://registry.ansil.io"
    rewrite:
      "^rancher/(.*)": "proxy/rancher/$1"

Installation step

export HTTP_PROXY=squid.ansil.io:3128
export HTTPS_PROXY=$HTTP_PROXY
INSTALL_RKE2_CHANNEL=v1.20 ./install.sh

One thing I noticed during installation is, below error appeared on the rpm post-installation step.

Failed to resolve typeattributeset statement at /var/lib/selinux/targeted/tmp/modules/400/rke2/cil:17
semodule:  Failed!

Error in kubelet


E1008 07:13:42.882484    1535 kuberuntime_sandbox.go:70] CreatePodSandbox for pod "etcd-rke2-rhel8.ansil.io_kube-system(ef11ca6fb492d20a062f350f941bb147)" failed: rpc error: code = Unknown desc = failed to create containerd task: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: failed to set /proc/self/attr/keycreate on procfs: write /proc/self/attr/keycreate: invalid argument: unknown

ansilh · 2021-10-08T11:53:07Z

There is something not right in the SElinux policy file.

# rpm -q --scripts rke2-selinux-0.8-2.el8.noarch
postinstall scriptlet (using /bin/sh):
semodule -n -i /usr/share/selinux/packages/rke2.pp

When I tried to load the module manually, I get the same error

# semodule -n -i /usr/share/selinux/packages/rke2.pp
Failed to resolve typeattributeset statement at /var/lib/selinux/targeted/tmp/modules/400/rke2/cil:17
semodule:  Failed!

ansilh · 2021-10-08T12:40:11Z

Here is the line that causing the issue

# cat /usr/share/selinux/packages/rke2.pp   | /usr/libexec/selinux/hll/pp > rke2.cli
# sed -n 17p rke2.cli
(typeattributeset cil_gen_require container_kvm_var_run_t)  <<----

ansilh · 2021-10-08T12:46:21Z

Didn't see any such types in the loaded container module. :(

# semodule -c --extract=container
Module 'container' does not exist at the default priority '400'. Extracting at highest existing priority '200'.

# grep container_kvm_var_run_t container.cil

ansilh · 2021-10-08T17:20:12Z

After upgrading the RPM to container-selinux-2.159.0-1.module_el8.5.0+733+9bb5dffa.noarch, I'm able to load the rke2 policy package.

# rpm -Uvh container-selinux-2.159.0-1.module_el8.5.0+733+9bb5dffa.noarch.rpm
warning: container-selinux-2.159.0-1.module_el8.5.0+733+9bb5dffa.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID 8483c65d: NOKEY
Verifying...                          ################################# [100%]
Preparing...                          ################################# [100%]
Updating / installing...
   1:container-selinux-2:2.159.0-1.mod################################# [ 50%]
Cleaning up / removing...
   2:container-selinux-2:2.124.0-1.mod################################# [100%]

# semodule -n -i /usr/share/selinux/packages/rke2.pp

Rebooted the node and then all system containers came up.

Cc: @brandond

brandond · 2021-10-08T21:18:43Z

Yes, upstream pulled in some... ill advised... updates to the container-selinux policy that we have had to work around:
containers/container-selinux#149 (comment)

cc @dweomer

dweomer · 2021-10-08T21:20:53Z

Hmm, should be compatible with 2.159.x

ansilh · 2021-10-11T05:26:20Z

Do we need to mention this in the support matrix or rke2 docs?
Looks like we need RHEL 8.5 to get SELinux working in this case.

mayank-reynencourt · 2022-02-01T19:08:11Z

Hi all,

i'm also facing same issue while deploying rke2 using ansible on RHEL 8.2,

does anyone get some solution for that?

belgaied2 · 2022-03-03T18:17:41Z

I am having the same issue on RedHat 8.4 on AWS, it looks like the error is:

sudo systemctl status rke2-server
● rke2-server.service - Rancher Kubernetes Engine v2 (server)
   Loaded: loaded (/usr/lib/systemd/system/rke2-server.service; enabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: exit-code) since Thu 2022-03-03 15:36:37 UTC; 728ms ago
     Docs: https://github.com/rancher/rke2#readme
  Process: 17941 ExecStopPost=/bin/sh -c systemd-cgls /system.slice/rke2-server.service | grep -Eo '[0-9]+ (containerd|kubelet)' | awk '{print $1}' | xargs -r kill (code=exited, status=0/SUCCESS)
  Process: 17938 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service (code=exited, status=1/FAILURE)

It looks like what is failing is the command before the execution of rke2:

[ec2-user@ip-172-31-44-245 ~]$ /bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
[ec2-user@ip-172-31-44-245 ~]$ echo $?
1

causing the service's executable to not run ?

On the same node, just running rke2 server results in a successful bootstrapping.

Steps to reproduce:

AWS EC2 instance using the AMI ami-06ec8443c2a35b0ba on eu-central-1 , this instance is based on RHEL 8.4 with kernel 4.18.0-305.el8.x86_64 .
All settings default: selinux is enforcing, firewalld is not enabled out-of-the-box
Then try to install RKE2:

[ec2-user@ip-172-31-44-245 ~]$ curl -sfL https://get.rke2.io | sudo sh -
[ec2-user@ip-172-31-44-245 ~]$ sudo systemctl enable rke2-server
[ec2-user@ip-172-31-44-245 ~]$ sudo systemctl start rke2-server

Using this user-data should result in same behavior:

#cloud-config
runcmd:
  - curl -sfL https://get.rke2.io | sudo sh -
  - sudo systemctl enable rke2-server
  - sudo systemctl start rke2-server

ansilh · 2022-03-03T19:09:59Z

@belgaied2 Not the same issue we discussed in the original thread.

Looks like you need to make sure the known issues are addressed as per the rke2 doc.
https://docs.rke2.io/known_issues/#networkmanager

In some operating systems like RHEL 8.4, NetworkManager includes two extra services 
called nm-cloud-setup.service and nm-cloud-setup.timer. 

These services add a routing table that interfere with the CNI plugin's configuration. 
Unfortunately, there is no config that can avoid that as explained in the [issue](https://github.com/rancher/rke2/issues/1053). 
Therefore, if those services exist, they should be disabled and the node must be rebooted.

belgaied2 · 2022-03-04T08:56:34Z

Thanks for the clarification!

stale · 2022-09-04T17:29:31Z

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

BrandonALXEllisSS mentioned this issue Aug 30, 2021

Not Compatible with latest RHEL8 #1722

Closed

brandond mentioned this issue Nov 16, 2021

RKE2 unable to run on EL8.5 #2143

Closed

brandond mentioned this issue May 16, 2022

RKE2 YUM/RPM - Selinux not working - What am i missing? #2917

Closed

stale bot added the status/stale label Sep 4, 2022

stale bot closed this as completed Sep 20, 2022

jeremy-london mentioned this issue Oct 6, 2022

Private Registry Airgap SELinux install errors #3427

Closed

brandond mentioned this issue Aug 1, 2023

Unable to install RKE2 on Amazon Linux 2023 #4527

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RKE2 not starting up services on RHEL8 #1539

RKE2 not starting up services on RHEL8 #1539

jcrosel commented Aug 5, 2021

brandond commented Aug 5, 2021

jcrosel commented Aug 6, 2021 •

edited

Loading

brandond commented Aug 6, 2021

jcrosel commented Aug 9, 2021

brandond commented Aug 9, 2021

jcrosel commented Aug 12, 2021

ansilh commented Oct 8, 2021

ansilh commented Oct 8, 2021

ansilh commented Oct 8, 2021

ansilh commented Oct 8, 2021

ansilh commented Oct 8, 2021 •

edited

Loading

brandond commented Oct 8, 2021

dweomer commented Oct 8, 2021

ansilh commented Oct 11, 2021

mayank-reynencourt commented Feb 1, 2022

belgaied2 commented Mar 3, 2022

ansilh commented Mar 3, 2022

belgaied2 commented Mar 4, 2022

stale bot commented Sep 4, 2022

RKE2 not starting up services on RHEL8 #1539

RKE2 not starting up services on RHEL8 #1539

Comments

jcrosel commented Aug 5, 2021

brandond commented Aug 5, 2021

jcrosel commented Aug 6, 2021 • edited Loading

brandond commented Aug 6, 2021

jcrosel commented Aug 9, 2021

brandond commented Aug 9, 2021

jcrosel commented Aug 12, 2021

ansilh commented Oct 8, 2021

ansilh commented Oct 8, 2021

ansilh commented Oct 8, 2021

ansilh commented Oct 8, 2021

ansilh commented Oct 8, 2021 • edited Loading

brandond commented Oct 8, 2021

dweomer commented Oct 8, 2021

ansilh commented Oct 11, 2021

mayank-reynencourt commented Feb 1, 2022

belgaied2 commented Mar 3, 2022

Steps to reproduce:

ansilh commented Mar 3, 2022

belgaied2 commented Mar 4, 2022

stale bot commented Sep 4, 2022

jcrosel commented Aug 6, 2021 •

edited

Loading

ansilh commented Oct 8, 2021 •

edited

Loading