pod resource limits: error creating cgroup path: subtree_control: ENOENT #15074

lsm5 · 2022-07-26T15:57:05Z

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

aarch64 CI enablement at #14801 is experiencing failures in the system tests. This issue is a placeholder for tracking and using in FIXME comments for skip_if_aarch64.

The text was updated successfully, but these errors were encountered:

edsantiago · 2022-07-26T18:26:12Z

pod resource limits test is in code that @cdoern just merged last week:

# # podman --cgroup-manager=cgroupfs pod create --name=resources-cgroupfs --cpus=5 --memory=5m --memory-swap=1g --cpu-shares=1000 --cpuset-cpus=0 --cpuset-mems=0 --device-read-bps=/dev/loop0:1mb --device-write-bps=/dev/loop0:1mb --blkio-weight-device=/dev/loop0:123 --blkio-weight=50
# Error: error creating cgroup path /libpod_parent/e0024c8b8ccc24c247b62a422433c0b69d7c3f930bad3863563fcec0d0db43f1: write /sys/fs/cgroup/libpod_parent/cgroup.subtree_control: no such file or directory
# [ rc=125 (** EXPECTED 0 **) ]

edsantiago · 2022-07-26T18:27:43Z

sdnotify test is systemd, so @vrothberg might be the best person to look at it, but it also could be crun, so, ping, @giuseppe also:

# # podman run -d --sdnotify=container quay.io/libpod/fedora:31 sh -c printenv NOTIFY_SOCKET;echo READY;systemd-notify --ready;while ! test -f /stop;do sleep 0.1;done
# 2ff76f9670f13c479196440ac93babe9fc4afa8cbb0e0b6799b73a3b59969292
# # podman logs 2ff76f9670f13c479196440ac93babe9fc4afa8cbb0e0b6799b73a3b59969292
# /run/notify/notify.sock
# READY
# �[0;1;31mFailed to notify init system: Permission denied�[0m

Lots more permission and SELinux errors, make me strongly suspect that SELinux is broken on these systems. It might be that the only way to debug is to ssh into one of them.

edsantiago · 2022-07-26T18:29:57Z

@lsm5 hint for next time: file the issue first, then go to the broken PR and find links to all the failing logs, paste them in the issue, and then resubmit the PR with skips. It's almost impossible to find old Cirrus logs for a PR. (I scraped the above from comments I made in your PR, so no problem. Just something to keep in mind for next time!)

cdoern · 2022-07-26T18:33:39Z

pod resource limits test is in code that @cdoern just merged last week:

# # podman --cgroup-manager=cgroupfs pod create --name=resources-cgroupfs --cpus=5 --memory=5m --memory-swap=1g --cpu-shares=1000 --cpuset-cpus=0 --cpuset-mems=0 --device-read-bps=/dev/loop0:1mb --device-write-bps=/dev/loop0:1mb --blkio-weight-device=/dev/loop0:123 --blkio-weight=50
# Error: error creating cgroup path /libpod_parent/e0024c8b8ccc24c247b62a422433c0b69d7c3f930bad3863563fcec0d0db43f1: write /sys/fs/cgroup/libpod_parent/cgroup.subtree_control: no such file or directory
# [ rc=125 (** EXPECTED 0 **) ]

the only reason this should fail is if arm does not have subtree control which I find highly unlikely. the subtree_control file is less related to my resource limits work and more related to cgroup creation in general. I know where this is done in containers/common but still... an issue like this makes me think the kernel is missing some things when complied.

giuseppe · 2022-07-26T20:30:09Z

the only reason this should fail is if arm does not have subtree control which I find highly unlikely. the subtree_control file is less related to my resource limits work and more related to cgroup creation in general. I know where this is done in containers/common but still... an issue like this makes me think the kernel is missing some things when complied.

could also be libpod_parent/ missing

cdoern · 2022-07-26T20:36:00Z

True @giuseppe but libpod_parent is created (if it does not exist) before subtree control I believe?

giuseppe · 2022-07-26T20:36:51Z

then /sys/fs/cgroup might not be a cgroup v2 mount

edsantiago · 2022-07-27T11:50:56Z

It's v2. I'm doing the Cirrus rerun-with-terminal thing, and trying to reproduce it, and can't: hack/bats 200:resource passes, as does manually recreating the fallocate, losetup, echo bfq, podman pod create commands. This could be something context-sensitive, where a prior test sets the system up in such a way that it causes this test to fail.

edsantiago · 2022-07-27T20:53:14Z

Still failing, but @lsm5 believes it might be a flake (which is consistent with my findings in the rerun terminal). I don't know if that's better or worse.

edsantiago · 2022-07-27T22:05:36Z

I'll be darned. It is a flake.

edsantiago · 2022-08-03T11:28:28Z

@cdoern @giuseppe please use @cevich's #15145 to spin up VMs and debug this.

github-actions · 2022-09-03T00:06:54Z

A friendly reminder that this issue had no activity for 30 days.

edsantiago · 2022-09-21T18:59:57Z

pod resource limits still flaking

Background: in order to add aarch64 tests, we had to add emergency skips to a lot of failing tests. No attempt was ever made to understand why they were failing. Fast forward to today, I filed containers#15888 just to see if tests are still failing. Looks like a number of them are fixed. (Yes, magically). Remove those skips. See: containers#15074, containers#15277 Signed-off-by: Ed Santiago <[email protected]>

edsantiago · 2023-06-02T11:25:22Z

Still happening on f38:

[+1177s] not ok 317 pod resource limits
...
<+008ms> # # podman --cgroup-manager=cgroupfs pod create --name=resources-cgroupfs --cpus=5 --memory=5m --memory-swap=1g --cpu-shares=1000 --cpuset-cpus=0 --cpuset-mems=0 --device-read-bps=/dev/loop0:1mb --device-write-bps=/dev/loop0:1mb --blkio-weight=50
<+209ms> # Error: creating cgroup path /libpod_parent/9f84a4a2767e6495567aaf02a54447213083db7484d539edae31add828221b45: write /sys/fs/cgroup/libpod_parent/cgroup.subtree_control: no such file or directory

edsantiago · 2023-07-11T11:12:23Z

Seen just now on my RH laptop:

✗ pod resource limits
...
   [05:05:24.431787056] # .../bin/podman --cgroup-manager=cgroupfs pod create --name=resources-cgroupfs --cpus=5 --memory=5m --memory-swap=1g --cpu-shares=1000 --cpuset-cpus=0 --cpuset-mems=0 --device-read-bps=/dev/loop0:1mb --device-write-bps=/dev/loop0:1mb --blkio-weight=50
   [05:05:24.528324789] Error: creating cgroup path /libpod_parent/09404b9d6c87cce725635b445cfc3b5bf0f5fb654dfece8a15296915e6d71871: write /sys/fs/cgroup/libpod_parent/cgroup.subtree_control: no such file or directory
   [05:05:24.541146057] [ rc=125 (** EXPECTED 0 **) ]

Passed on rerun. Again, this is my RH laptop, not aarch64.

- containers#15074 ("subtree_control" flake). The flake is NOT FIXED, I saw it six months ago on my (non-aarch64) laptop. However, it looks like the frequent-flake-on-aarch64 bug is resolved. I've been testing in containers#17831 and have not seen it. So, tentatively remove the skip and see what happens. - Closes: containers#19407 (broken tar, "duplicates of file paths") All Fedoras now have a fixed tar. Debian DOES NOT, but we're handling that in our build-ci-vm code. I.e., the Debian VM we're using has a working tar even though there's currently a broken tar out in the wild. Added distro-integration tag so we can catch future problems like this in OpenQA. - Closes: containers#19471 (brq / blkio / loopbackfs in rawhide) Bug appears to be fixed in rawhide, at least in the VMs we're using now. Added distro-integration tag because this test obviously relies on other system stuff. Signed-off-by: Ed Santiago <[email protected]>

edsantiago · 2024-08-14T13:35:13Z

Seen after a long absence, f40 root, in parallel system tests though I doubt the parallel has anything to do with anything.

edsantiago · 2024-08-27T13:40:23Z

Ping, seeing this one often in parallel system tests.

fedora-39 : sys podman fedora-39 root host boltdb
- PR WIP: system test parallelization: two-pass approach #23275
  - 08-15 10:30 in [sys] |200| pod resource limits
fedora-40 : sys podman fedora-40 root host sqlite
- PR WIP: system test parallelization: two-pass approach #23275
  - 08-14 15:46 in [sys] |200| pod resource limits
  - 08-14 09:26 in [sys] |200| pod resource limits
fedora-40-aarch64 : sys podman fedora-40-aarch64 root host sqlite
- PR WIP: system test parallelization: two-pass approach #23275
  - 08-15 18:10 in [sys] |200| pod resource limits
  - 08-15 13:37 in [sys] |200| pod resource limits
rawhide : sys podman rawhide root host sqlite
- PR WIP: system test parallelization: two-pass approach #23275
  - 08-27 09:10 in [sys] |200| pod resource limits
  - 08-26 17:20 in [sys] |200| pod resource limits

x	x	x	x	x	x
sys(7)	podman(7)	fedora-40-aarch64(2)	root(7)	host(7)	sqlite(6)
		rawhide(2)			boltdb(1)
		fedora-40(2)
		fedora-39(1)

edsantiago · 2024-09-09T21:30:50Z

Continuing to see this often in parallel system tests

fedora-39 : sys podman fedora-39 root host boltdb
- PR WIP: system test parallelization: two-pass approach #23275
  - 08-28 19:27 in [sys] |200| pod resource limits
  - 08-15 10:30 in [sys] |200| pod resource limits
fedora-40 : sys podman fedora-40 root host boltdb
- PR WIP: system test parallelization: two-pass approach #23275
  - 09-09 17:24 in [sys] |200| pod resource limits
  - 09-05 19:00 in [sys] |200| pod resource limits
fedora-40 : sys podman fedora-40 root host sqlite
- PR WIP: system test parallelization: two-pass approach #23275
  - 09-05 08:54 in [sys] |200| pod resource limits
  - 08-14 15:46 in [sys] |200| pod resource limits
  - 08-14 09:26 in [sys] |200| pod resource limits
fedora-40-aarch64 : sys podman fedora-40-aarch64 root host sqlite
- PR WIP: system test parallelization: two-pass approach #23275
  - 09-03 21:34 in [sys] |200| pod resource limits
  - 08-15 18:10 in [sys] |200| pod resource limits
  - 08-15 13:37 in [sys] |200| pod resource limits
rawhide : sys podman rawhide root host sqlite
- PR WIP: system test parallelization: two-pass approach #23275
  - 08-27 09:10 in [sys] |200| pod resource limits
  - 08-26 17:20 in [sys] |200| pod resource limits

x	x	x	x	x	x
sys(12)	podman(12)	fedora-40(5)	root(12)	host(12)	sqlite(8)
		fedora-40-aarch64(3)			boltdb(4)
		rawhide(2)
		fedora-39(2)

giuseppe · 2024-09-10T14:55:14Z

adding some code through containers/common#2158 to help debugging this issue

openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 26, 2022

lsm5 mentioned this issue Jul 26, 2022

Cirrus: enable Fedora 36 aarch64 tasks on EC2 #14801

Merged

edsantiago mentioned this issue Aug 3, 2022

add omitempty to Secret in k8s VolumeSource #15158

Merged

edsantiago mentioned this issue Aug 9, 2022

[DO NOT MERGE] fix aarch64 cgroup flake #15179

Closed

cevich mentioned this issue Aug 9, 2022

Set EC2 cirrus-agent SELinux context to unconfined containers/automation_images#163

Merged

github-actions bot added the stale-issue label Sep 3, 2022

rhatdan removed the stale-issue label Sep 19, 2022

edsantiago mentioned this issue Sep 21, 2022

System tests: reenable some skipped aarch64 tests #15894

Merged

edsantiago changed the title ~~aarch64 CI - investigate system test failures~~ aarch64 CI - error creating cgroup path: subtree_control: ENOENT Sep 22, 2022

edsantiago added the flakes Flakes from Continuous Integration label Jun 5, 2023

edsantiago changed the title ~~aarch64 CI - error creating cgroup path: subtree_control: ENOENT~~ pod resource limits: error creating cgroup path: subtree_control: ENOENT Jul 11, 2023

edsantiago mentioned this issue Jan 16, 2024

CI: reenable tests that are working again #21271

Merged

giuseppe mentioned this issue Sep 10, 2024

cgroups: improve ENOENT and EBUSY handling containers/common#2158

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pod resource limits: error creating cgroup path: subtree_control: ENOENT #15074

pod resource limits: error creating cgroup path: subtree_control: ENOENT #15074

lsm5 commented Jul 26, 2022

edsantiago commented Jul 26, 2022

edsantiago commented Jul 26, 2022

edsantiago commented Jul 26, 2022

cdoern commented Jul 26, 2022

giuseppe commented Jul 26, 2022

cdoern commented Jul 26, 2022

giuseppe commented Jul 26, 2022

edsantiago commented Jul 27, 2022

edsantiago commented Jul 27, 2022

edsantiago commented Jul 27, 2022

edsantiago commented Aug 3, 2022

github-actions bot commented Sep 3, 2022

edsantiago commented Sep 21, 2022

edsantiago commented Jun 2, 2023

edsantiago commented Jul 11, 2023

edsantiago commented Aug 14, 2024

edsantiago commented Aug 27, 2024

edsantiago commented Sep 9, 2024

giuseppe commented Sep 10, 2024

pod resource limits: error creating cgroup path: subtree_control: ENOENT #15074

pod resource limits: error creating cgroup path: subtree_control: ENOENT #15074

Comments

lsm5 commented Jul 26, 2022

edsantiago commented Jul 26, 2022

edsantiago commented Jul 26, 2022

edsantiago commented Jul 26, 2022

cdoern commented Jul 26, 2022

giuseppe commented Jul 26, 2022

cdoern commented Jul 26, 2022

giuseppe commented Jul 26, 2022

edsantiago commented Jul 27, 2022

edsantiago commented Jul 27, 2022

edsantiago commented Jul 27, 2022

edsantiago commented Aug 3, 2022

github-actions bot commented Sep 3, 2022

edsantiago commented Sep 21, 2022

edsantiago commented Jun 2, 2023

edsantiago commented Jul 11, 2023

edsantiago commented Aug 14, 2024

edsantiago commented Aug 27, 2024

edsantiago commented Sep 9, 2024

giuseppe commented Sep 10, 2024