Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restarting dbus results in kubelet being unable to start pods #2172

Closed
cbgbt opened this issue Jun 2, 2022 · 4 comments
Closed

Restarting dbus results in kubelet being unable to start pods #2172

cbgbt opened this issue Jun 2, 2022 · 4 comments
Assignees
Labels
status/in-progress This issue is currently being worked on status/needs-triage Pending triage or re-evaluation
Milestone

Comments

@cbgbt
Copy link
Contributor

cbgbt commented Jun 2, 2022

Image I'm using: Any Kubernetes variant

What I expected to happen:
If the dbus service is restarted, scheduling pods with Kubernetes should not be impacted.

What actually happened:
Scheduling a pod does not complete after dbus has been restarted unless the kubelet is also restarted:

unable to ensure pod container exists: failed to create container for [kubepods besteffort ...] : dbus: connection closed by user

How to reproduce the problem:

  • systemctl restart dbus as admin in a node
  • Attempt to schedule pods to this node

Additional Details:
This is fixed in runc in opencontainers/runc#3475 and backported to 1.1.x in opencontainers/runc#3476; however, to use this fix, we need our kubernetes packaging to stop using vendored libct and instead get a cached version from our build of runc.

@kolyshkin
Copy link

Should be fixed upstream by kubernetes/kubernetes#110496

@kdaula kdaula added this to 1.9.0 Jun 28, 2022
@kdaula kdaula added this to the 1.9.0 milestone Jun 28, 2022
@kdaula kdaula modified the milestones: 1.9.0, 1.10.0 Jul 25, 2022
@kdaula kdaula removed this from 1.9.0 Jul 25, 2022
@kdaula kdaula modified the milestones: 1.10.0, Q4 Aug 2, 2022
@stmcginnis stmcginnis added the status/needs-triage Pending triage or re-evaluation label Dec 1, 2022
@gthao313 gthao313 self-assigned this Jan 30, 2023
@gthao313 gthao313 added the status/in-progress This issue is currently being worked on label Jan 30, 2023
@arnaldo2792
Copy link
Contributor

I tested this in k8s 1.23, Bottlerocket 1.12, the problem persists:

Jan 30 22:20:32 ip-192-168-63-75.us-west-2.compute.internal kubelet[2172]: E0130 22:20:32.587216    2172 qos_container_manager_linux.go:375] "Failed to update QoS cgroup configuration" err="dbus: connection closed by user"

@gthao313
Copy link
Member

The problem still persist on all k8s variants except 1.25, because upstream kubernetes only bump runc to latest version on 1.25 version. The rest k8s version still remain on old version runc which cause this problem.

@cbgbt
Copy link
Contributor Author

cbgbt commented Feb 7, 2023

I think rather than rebase the vendored runc fix to previous versions, we can close this issue, since the fix should be available in Kubernetes 1.25 variants and beyond.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/in-progress This issue is currently being worked on status/needs-triage Pending triage or re-evaluation
Projects
Development

No branches or pull requests

6 participants