Sporadic runc exec failures #1884

liron-l · 2018-09-02T18:52:50Z

As of runc 1.0.0-rc5+dev we've started noticing increase rate in sporadic errors during k8s liveness proves.
When the test fails, it produces the following error:

false Error:  OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "process_linux.go:138: adding pid 3101 to cgroups caused \"failed to write 3101 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/docker/1d7b29b96dfd7bc97c6a2d6cbff82b00509cdcc4dbf2ac72ef5dd2bef9db7067/cgroup.procs: invalid argument\"": unknown

This might be related to #1326 and moby/moby#31230, but without a root cause or a resolution we can't be sure.
When playing with runc code, it appears that the error disappears if we retry running cgroups.EnterPid on failure (which might indicate this is a transient/race issue).

We've found that it's easier to reproduce this issue if we run perf together with excessive docker exec load (we tried this on ubuntu xenial and other OSs).

Run perf

perf trace --no-syscalls --event 'sched:*'

Run the following code (similar to how k8s is invoking the liveness tests)

package main

import (
	"fmt"
	"github.com/fsouza/go-dockerclient"
	"time"
	"strings"
)

func main() {
	client, err := docker.NewClient("unix:///var/run/docker.sock")
	if err != nil {
		panic(err)
	}
	go func() {
		for {
			fmt.Println("Running", time.Now().String())
			time.Sleep(time.Hour)
		}
	}()
	const container = "test-container"
	client.RemoveContainer(docker.RemoveContainerOptions{
		ID:    container,
		Force: true,
	})
	_, err = client.CreateContainer(docker.CreateContainerOptions{
		Name: container,
		Config: &docker.Config{
			Image:        "busybox",
			Cmd:          []string{"sleep", "10000"},
			Tty:          true,
		},
	})
	if err != nil {
		panic(err)
	}
	if err := client.StartContainer(container, nil); err != nil {
		panic(err)
	}
	var tty bool

	for {
		tty = !tty
		exec, err := client.CreateExec(docker.CreateExecOptions{AttachStdout: true, AttachStderr: true, Tty: tty, Container: container, Cmd: []string{"echo", "A"}})
		if err != nil {
			panic(err)
		}
		m := logger{}
		err = client.StartExec(exec.ID, docker.StartExecOptions{Tty: tty, OutputStream: m, ErrorStream: m})
		if err != nil && strings.Contains(err.Error(), "invalid") {
			fmt.Println(err.Error())
			panic(err)
		} else if err != nil {
			fmt.Println("err", err, time.Now())
		}
	}
}

type logger struct{}
func (m logger) Write(p []byte) (n int, err error) {
	if strings.Contains(string(p), "cgroup") {
		fmt.Println(time.Now().String(), "Error:", string(p))
	}
	return len(p), nil
}

Runc version

runc version 1.0.0-rc5+dev
commit: 69663f0bd4b60df09991c08812a60108003fa340-dirty
spec: 1.0.0

The text was updated successfully, but these errors were encountered:

liron-l · 2018-09-06T07:54:45Z

Hi guys,

We think it's a problem related to the fact that runc is written in go.

Specifically, we tracked the issue to the following condition in the cpu_cgroup_can_attach kernel function:

                /*
		 * Avoid calling sched_move_task() before wake_up_new_task()
		 * has happened. This would lead to problems with PELT, due to
		 * move wanting to detach+attach while we're not attached yet.
		 */
		if (task->state == TASK_NEW)
			ret = -EINVAL;

This check was added in kernel version v4.8.

What we see from the kernel state, is that the rejected pid belongs to a thread whose parent is the pid that runc is trying to add to the cgroup.proc.
This probably means that it's a go thread that is scheduled by the go runtime.

Here is a full example of the repro and kernel state:

adding pid 18413 to cgroups caused \"failed to write 18413 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/docker/b906c3bcd1d0cbe999b9ba48bb07fc6cedbb05cc5a0fb5b15645c3f5660ef181/cgroup.procs: invalid argument\""

And the task states in the kernel are:

PPID	PID	TID	Description
-	18412	18412	docker runc init (process that is spawned by runc)
18412	18413	18413	docker runc init (process that is added to the cgroup)
18413	18414	18413	thread that causes adding pid `18413` cgroup failure

@hqhq @crosbymichael:
There are few possible mitigations for this issue, e.g.,

Check the threads state (from /proc) prior to adding the runc-init pid to the cgroup (not a perfect fix, but eliminates the problem with high probability)
Write a small shim in c that will be added to the cgroup.

We would be happy to own this fix, just let us know if you have other ideas in mind.

cyphar · 2018-09-10T10:49:54Z

Awesome. Another day, another reason why using Go was a mistake. 😉

In all seriousness, yeah this is a pretty bad issue. I think the easiest solution would be to just carry #1184 -- which includes a patch where the cgroup attachment code is done while runc init is still inside the C shim (which is necessary so that we can unshare(CLONE_NEWCGROUP) after the attach). This should solve the Go runtime thread problem.

liron-l · 2018-09-11T06:38:34Z

Thanks @cyphar!
Is #1184 still active?
Will it help if we continue the remaining work or extract the synchronization mechanism to a seperate PR?

cyphar · 2018-09-11T06:47:22Z

It isn't active, we'd need to carry it. To be honest, the patch was mostly done it just needed some review -- I had quite a few concerns in the early review process but I think the patch just stagnated. I can try to carry it this week and see if it solves the issue for you.

liron-l · 2018-09-27T10:30:43Z

Hi @cyphar any updates about the fix? Can we assist in any-way?

cyphar · 2018-09-28T22:31:15Z

Sorry, I've been working on some kernel patches related to container runtime security. I will work on carrying the PR next week.

crosbymichael · 2018-10-08T14:09:51Z

I'm going to look into this today and see if we can get a smaller diff to fix the issues you are seeing.

jaeco · 2018-10-17T18:23:36Z

Hello, I have users that are experiencing pain from this issue as well, do we have any updates on a fix?

crosbymichael · 2018-10-22T15:04:58Z

@jaeco I just submited a rebased PR to fix this issue. We were able to reproduce and I can confirm the fix resolves this issue.

jaeco · 2018-10-22T17:20:19Z

@jaeco I just submited a rebased PR to fix this issue. We were able to reproduce and I can confirm the fix resolves this issue.

Thank you so much!!

Fix for opencontainers/runc#1884 is in opencontainers/runc#1916 - problem described in 1884 is independent of kubernetes version. - happens with kernel >= v4.8 - fixed in runc version v1.0.0-rc6 and above ( opencontainers/runc@9a3a8a5 ) - which got pulled into containerd v1.3.0-beta.0 and above ( containerd/containerd@97dd5df#diff-4061fcef378a6d912e14e2ce162a1995)

rastislavs mentioned this issue Oct 9, 2018

Sporadic kubectl exec issue Ubuntu 18.04 contiv/vpp#1164

Open

crosbymichael mentioned this issue Oct 22, 2018

Add support for cgroup namespace #1916

Merged

holiman mentioned this issue Nov 12, 2018

Fuzzer engine, spurious failures ethereum/evmlab#102

Closed

jwatte mentioned this issue Nov 12, 2018

Occasional failure to start container with failing to write to cgroup.procs error moby/moby#31230

Closed

mrunalp closed this as completed in #1916 Nov 13, 2018

topherbullock mentioned this issue Nov 28, 2018

Mock resource check fails sporadically concourse/concourse#2861

Closed

BooleanCat mentioned this issue Dec 17, 2018

Resilience in adding of exec tasks to cgroups #1950

Merged

chinglinwen mentioned this issue Jun 14, 2019

Running container killed with failure to write to cgroup.procs #1326

Open

ghost mentioned this issue Jun 21, 2019

Update docker to 18.09 amazonlinux/container-images#24

Closed

hingstarne mentioned this issue Aug 3, 2019

Kubernetes liveness probes with docker exec fail randomly coreos/bugs#2605

Open

dims mentioned this issue Oct 14, 2019

Bump Containerd version to 1.3.0 kubernetes-sigs/image-builder#77

Merged

tedyu mentioned this issue Apr 4, 2020

Liveness probe failed: OCI runtime exec failed kubernetes/kubernetes#89862

Closed

iaguis mentioned this issue Apr 7, 2020

Update runc to 1.0.0-rc10 in Stable and Beta flatcar/Flatcar#74

Closed

kolyshkin mentioned this issue Jun 1, 2020

(*initProcess).start: rm second Apply #2446

Merged

jseba mentioned this issue Sep 19, 2022

EINVAL writing cgroup.procs when creating container google/gvisor#7999

Closed

kzys mentioned this issue Nov 4, 2022

Flaky test: TestShimOOMScore (on RockyLinux via CirrusCI) containerd/containerd#7532

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sporadic runc exec failures #1884

Sporadic runc exec failures #1884

liron-l commented Sep 2, 2018

liron-l commented Sep 6, 2018

cyphar commented Sep 10, 2018

liron-l commented Sep 11, 2018

cyphar commented Sep 11, 2018

liron-l commented Sep 27, 2018

cyphar commented Sep 28, 2018

crosbymichael commented Oct 8, 2018

jaeco commented Oct 17, 2018

crosbymichael commented Oct 22, 2018

jaeco commented Oct 22, 2018

Sporadic runc exec failures #1884

Sporadic runc exec failures #1884

Comments

liron-l commented Sep 2, 2018

liron-l commented Sep 6, 2018

cyphar commented Sep 10, 2018

liron-l commented Sep 11, 2018

cyphar commented Sep 11, 2018

liron-l commented Sep 27, 2018

cyphar commented Sep 28, 2018

crosbymichael commented Oct 8, 2018

jaeco commented Oct 17, 2018

crosbymichael commented Oct 22, 2018

jaeco commented Oct 22, 2018