Restarting dbus results in kubelet being unable to start pods #2172

cbgbt · 2022-06-02T19:30:50Z

Image I'm using: Any Kubernetes variant

What I expected to happen:
If the dbus service is restarted, scheduling pods with Kubernetes should not be impacted.

What actually happened:
Scheduling a pod does not complete after dbus has been restarted unless the kubelet is also restarted:

unable to ensure pod container exists: failed to create container for [kubepods besteffort ...] : dbus: connection closed by user

How to reproduce the problem:

systemctl restart dbus as admin in a node
Attempt to schedule pods to this node

Additional Details:
This is fixed in runc in opencontainers/runc#3475 and backported to 1.1.x in opencontainers/runc#3476; however, to use this fix, we need our kubernetes packaging to stop using vendored libct and instead get a cached version from our build of runc.

The text was updated successfully, but these errors were encountered:

kolyshkin · 2022-06-09T23:34:49Z

Should be fixed upstream by kubernetes/kubernetes#110496

arnaldo2792 · 2023-01-30T22:21:00Z

I tested this in k8s 1.23, Bottlerocket 1.12, the problem persists:

Jan 30 22:20:32 ip-192-168-63-75.us-west-2.compute.internal kubelet[2172]: E0130 22:20:32.587216    2172 qos_container_manager_linux.go:375] "Failed to update QoS cgroup configuration" err="dbus: connection closed by user"

gthao313 · 2023-01-30T22:24:59Z

The problem still persist on all k8s variants except 1.25, because upstream kubernetes only bump runc to latest version on 1.25 version. The rest k8s version still remain on old version runc which cause this problem.

cbgbt · 2023-02-07T19:17:00Z

I think rather than rebase the vendored runc fix to previous versions, we can close this issue, since the fix should be available in Kubernetes 1.25 variants and beyond.

cbgbt mentioned this issue Jun 2, 2022

Update to latest containerd, runc, docker #2158

Merged

6 tasks

kdaula added this to 1.9.0 Jun 28, 2022

kdaula assigned cbgbt Jun 28, 2022

kdaula added this to the 1.9.0 milestone Jun 28, 2022

cbgbt mentioned this issue Jun 28, 2022

Investigate whether or not services enter a degraded state when dbus is restarted #2253

Closed

kdaula modified the milestones: 1.9.0, 1.10.0 Jul 25, 2022

kdaula removed this from 1.9.0 Jul 25, 2022

kdaula added this to Bottlerocket Engineering Roadmap Jul 25, 2022

kdaula modified the milestones: 1.10.0, Q4 Aug 2, 2022

kdaula removed this from Bottlerocket Engineering Roadmap Aug 2, 2022

stmcginnis added the status/needs-triage Pending triage or re-evaluation label Dec 1, 2022

gthao313 self-assigned this Jan 30, 2023

gthao313 added the status/in-progress This issue is currently being worked on label Jan 30, 2023

kdaula added this to Bottlerocket Engineering Roadmap Jan 30, 2023

cbgbt closed this as completed Feb 7, 2023

github-project-automation bot moved this to Done in Bottlerocket Engineering Roadmap Feb 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restarting dbus results in kubelet being unable to start pods #2172

Restarting dbus results in kubelet being unable to start pods #2172

cbgbt commented Jun 2, 2022

kolyshkin commented Jun 9, 2022

arnaldo2792 commented Jan 30, 2023

gthao313 commented Jan 30, 2023

cbgbt commented Feb 7, 2023

Restarting dbus results in kubelet being unable to start pods #2172

Restarting dbus results in kubelet being unable to start pods #2172

Comments

cbgbt commented Jun 2, 2022

kolyshkin commented Jun 9, 2022

arnaldo2792 commented Jan 30, 2023

gthao313 commented Jan 30, 2023

cbgbt commented Feb 7, 2023