Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman run --cgroups=disabled: missing "supervisor" #11191

Closed
edsantiago opened this issue Aug 10, 2021 · 21 comments · Fixed by #12162
Closed

podman run --cgroups=disabled: missing "supervisor" #11191

edsantiago opened this issue Aug 10, 2021 · 21 comments · Fixed by #12162
Assignees
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@edsantiago
Copy link
Member

One of the integration tests is flaking when reading /proc/NNN/cgroup:

$ podman run --name testctr -d --cgroups=disabled quay.io/libpod/alpine:latest top
$ podman inspect testctr
...
$ (read /proc/X/cgroup)
       Expected
               <string>: 0::/system.slice/google-startup-scripts.service
           to equal
               <string>: 0::/system.slice/google-startup-scripts.service/supervisor

Podman run [It] podman run with cgroups=disabled runs without cgroups

@edsantiago edsantiago added the flakes Flakes from Continuous Integration label Aug 10, 2021
@rhatdan
Copy link
Member

rhatdan commented Aug 10, 2021

Can we change the test to be less specific? I /supervisor critical to the test?

@edsantiago
Copy link
Member Author

The test itself doesn't have any supervisor in it; it's just comparing /proc/self/cgroup to /proc/CONTAINERPID/cgroup. (I think). git blames @mheon in #3581. Matt, how's your memory?

@mheon
Copy link
Member

mheon commented Aug 10, 2021

This test verifies that the container is running as the same cgroup that launched it if we run with --cgroups=disabled. An exact match is important as such. The test itself is sound. I can think of no obvious reason why it would not be failing.

@rhatdan
Copy link
Member

rhatdan commented Aug 11, 2021

@giuseppe thoughts?

@giuseppe
Copy link
Member

we create the /supervisor sub-cgroup only when the cgroups mode is set to --cgroups=split:

	if ctr.config.CgroupsMode == cgroupSplit {
		if err := utils.MoveUnderCgroupSubtree("supervisor"); err != nil {
			return err
		}
	}

I don't see any other way of ending up under /supervisor.

One thing we could try to make the test more robust is to avoid going through its PID to read the cgroup with --cgroupns=host as:

podman run --name testctr --rm  --cgroupns=host --cgroups=disabled quay.io/libpod/alpine:latest cat /proc/self/cgroup

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Sep 13, 2021

@giuseppe @edsantiago @mheon any movement on this issue?

@giuseppe
Copy link
Member

opened a PR to simplify the test: #11561

@edsantiago
Copy link
Member Author

#11561 has merged. Closing, with fingers crossed.

@edsantiago
Copy link
Member Author

This is happening again, sort of, except now it's the opposite:

Expected
      <string>: 0::/system.slice/google-startup-scripts.service/supervisor
  to equal
      <string>: 0::/system.slice/google-startup-scripts.service

(it used to be the other way around).

Podman run [It] podman run with cgroups=disabled runs without cgroups

@edsantiago edsantiago reopened this Oct 4, 2021
@giuseppe
Copy link
Member

I think these old PRs were failing because they were not rebased yet to include the fix.

Have you seen again this flake in the last week?

@edsantiago
Copy link
Member Author

No, not since the 10-01 instances reported above. Okay, I'll close again - thank you!

@edsantiago
Copy link
Member Author

Podman run [It] podman run with cgroups=disabled runs without cgroups

@cevich
Copy link
Member

cevich commented Oct 27, 2021

Note: google-startup-scripts.service is "special". It's run early on by systemd during boot with "unconfined" SELinux labels. This is responsible for executing the Cirrus-CI agent process, which in turn executes the commands from the YAML. i.e. this is a case where running the test in CI is actually different from running it manually (esp. as far as SELinux is concerned). The process tree should look something like:

systemd -> google-startup-scripts.service -> cirrus-agent (name unknown) -> bash -> make

@giuseppe what does this supervisor node presence signify?. Do we even care about this node? I could easily modify the test to pass with or without /supervisor. That would be an easy "fix".

@cevich
Copy link
Member

cevich commented Oct 27, 2021

(Re-opening, since clearly the issue has popped up again after rebasing.)

@cevich cevich reopened this Oct 27, 2021
@cevich
Copy link
Member

cevich commented Oct 27, 2021

@giuseppe as discussed, there's a cgroup name-clash (supervisor) that's used by both podman and systemd(?). In order to debug this, you'd like to rename the podman one and then see which name shows up in a subsequent test-failure. For now, I will disable the test in #11795 with a Skip() mentioning this issue.

@giuseppe
Copy link
Member

@cevich @edsantiago opened a PR to help debugging this issue: #12120

cevich added a commit to cevich/podman that referenced this issue Oct 27, 2021
giuseppe added a commit to giuseppe/libpod that referenced this issue Oct 28, 2021
we are having a hard time figuring out a failure in the CI:

containers#11191

Rename the sub-cgroup created here, so we can be certain the error is
caused by this part.

[NO NEW TESTS NEEDED] we need this for the CI.

Signed-off-by: Giuseppe Scrivano <[email protected]>
@cevich
Copy link
Member

cevich commented Nov 1, 2021

@giuseppe It reproduced again with your new name (log)

         • Failure [1.763 seconds]
         Podman run
         /var/tmp/go/src/github.com/containers/podman/test/e2e/run_test.go:25
           podman run with cgroups=disabled runs without cgroups [It]
           /var/tmp/go/src/github.com/containers/podman/test/e2e/run_test.go:1396
         
           Expected
               <string>: 0::/system.slice/google-startup-scripts.service/runtime
           to equal
               <string>: 0::/system.slice/google-startup-scripts.service

So that means it cannot be systemd, it must be a podman or crun bug, eh?

@giuseppe
Copy link
Member

giuseppe commented Nov 2, 2021

yeah, it seems it is all our fault :/

@giuseppe
Copy link
Member

giuseppe commented Nov 2, 2021

another attempt:#12162

@cevich
Copy link
Member

cevich commented Nov 2, 2021

yeah, it seems it is all our fault :/

Proof positive that no matter how hard we try, we're still human. But a positive way of looking at it is: At least we don't need to patch systemd + wait for a new version to be released + incorporate that into our VMs 😁

giuseppe added a commit to giuseppe/libpod that referenced this issue Nov 4, 2021
the --cgroups=split test changes the current cgroup as it creates a
sub-cgroup.  This can cause a race condition in tests that are reading
the current cgroup.

Closes: containers#11191

Signed-off-by: Giuseppe Scrivano <[email protected]>
mheon pushed a commit to mheon/libpod that referenced this issue Nov 12, 2021
the --cgroups=split test changes the current cgroup as it creates a
sub-cgroup.  This can cause a race condition in tests that are reading
the current cgroup.

Closes: containers#11191

Signed-off-by: Giuseppe Scrivano <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants